{"id":2097,"date":"2018-04-16T16:13:23","date_gmt":"2018-04-16T23:13:23","guid":{"rendered":"http:\/\/www.hhhh.org\/joeboy\/blog\/?p=2097"},"modified":"2018-04-18T15:56:42","modified_gmt":"2018-04-18T22:56:42","slug":"tree-draft","status":"publish","type":"post","link":"https:\/\/www.hhhh.org\/joeboy\/blog\/?p=2097","title":{"rendered":"Decision Tree Classifiers &#8211; A simple example"},"content":{"rendered":"<p>Here is a simple &#8220;Machine Learning&#8221; Python program using scikit-learn\u2019s DecisionTree classifier to use height and weight to predict your body type. For the record \u2013 this is why people hate BMI and things like it. After writing this I think I need to go on a diet. <\/p>\n<p>Identification Trees \u2013 often called decision trees &#8211; provide a way to deterministically map a bunch of qualitative observations into predictions.  Basically the predictions are a set of observed output states, and we are looking for observable features, inputs, that we can use in a tree of tests. <\/p>\n<p>Training builds the decision tree from two sets of data, our set of observations and a set of labels corresponding to each of the observations. Each node in the tree represents a test that cuts the training set with a number of cuts \u2013 the results of each of those cuts going on to either subsequent tests, or to a leaf node representing a specific output label or state. <\/p>\n<p>The MIT open courseware video <a href=\"https:\/\/youtu.be\/SXBG3RGr_Rc\" title=\"MIT open courseware - lecture 11 on Identification Trees and Disorder\">Identification Trees and Disorder<\/a> is a good introduction. <\/p>\n<p>So lets say we wanted to determine based on someone\u2019s height and weight if they were overweight or not. To get some training data we could take bunch of random samples of a representative population of people \u2013 ask them their height and weight and then create labels for each person determining if they were of a normal weight, overweight, or obese.  That would not be a fun data set to try and collect \u2013 so lets cheat. <\/p>\n<p>The Body Mass Index or BMI is an equation already derived from population health data that roughly maps height and weight into a number, the BMI. The BMI can be used to predict if a person is under weight, normal weight, and over weight, or obese.  The BMI equation is roughly BMI = [(weight in pounds * 703)\/(height in inches squared)]. A BMI of less than 18.5 are underweight, BMIs between 19 and 25 reflect a normal weight, a BMI of 25-30 correspond to being overweight, and a BMI over 30 signals obesity. So BMI equations let us build a table mapping height and weight to a table that would be representative of uniform sampling of a large population.<\/p>\n<p>So using BMI sampling data here is a simple Python program using sklearn\u2019s DecisionTree classifier to tell you if you are obese, overweight, or normal weight. <\/p>\n<p><PRE><br \/>\nfrom sklearn import tree<\/p>\n<p>BMI_features = [ &#8220;NOR&#8221;, &#8220;NOR&#8221;, &#8230; lots of data here &#8230; , &#8220;OBE&#8221;, &#8220;OBE&#8221;]<br \/>\nHeight_in_Weight_lbs_samples = [[91,58],[96,58], &#8230; lots of data here &#8230; ,[279,76],[287,76]]<\/p>\n<p># Create identification tree from BMI table.<br \/>\nclf = tree.DecisionTreeClassifier()<br \/>\nclf = clf.fit(Height_in_Weight_lbs_samples, BMI_features)<\/p>\n<p>looping = True<br \/>\nwhile( looping ):<br \/>\n     weight = input(&#8220;Enter your weight in lbs: &#8220;)<br \/>\n     if not weight:<br \/>\n         break<\/p>\n<p>     height = input(&#8220;Enter your height in inches: &#8220;)<br \/>\n     if not height:<br \/>\n         break<\/p>\n<p>     prediction = clf.predict([[weight,height]])<br \/>\n     print(&#8220;It appears that you are:&#8221;, prediction, &#8220;\\r\\n&#8221; )<\/p>\n<p><\/PRE><\/p>\n<p>Output of the program looks something like this. Yeah, I&#8217;m regretting both dessert and choosing this example.<\/p>\n<p><PRE><br \/>\nEnter your weight in lbs: 225<br \/>\nEnter your height in inches: 70<br \/>\nIt appears that you are: [&#8216;OBE&#8217;] <\/p>\n<p>Enter your weight in lbs: 175<br \/>\nEnter your height in inches: 70<br \/>\nIt appears that you are: [&#8216;OVE&#8217;] <\/p>\n<p>Enter your weight in lbs: 168<br \/>\nEnter your height in inches: 70<br \/>\nIt appears that you are: [&#8216;NOR&#8217;] <\/p>\n<p>Enter your weight in lbs:<br \/>\n<\/PRE><\/p>\n<p>Sometimes it can be useful to look directly at the generated decision tree. This code generates a visualization of the tree. <\/p>\n<p><PRE><br \/>\n# Generate a graph visualizing the trained decesion tree.<br \/>\n import graphviz<br \/>\n dot_data = tree.export_graphviz(clf, out_file=None)<br \/>\n graph = graphviz.Source(dot_data)<br \/>\n graph.render(&#8220;BMI_Table&#8221;, view=True)<br \/>\n<\/PRE><\/p>\n<p>I put this code, including the full data sets for training up at: git@github.com:aarontoney\/Machine_Learning_Examples.git<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here is a simple &#8220;Machine Learning&#8221; Python program using scikit-learn\u2019s DecisionTree classifier to use height and weight to predict your body type. For the record \u2013 this is why people hate BMI and things like it. After writing this I think I need to go on a diet. Identification Trees \u2013 often called decision trees [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[75,86],"tags":[],"class_list":["post-2097","post","type-post","status-publish","format-standard","hentry","category-data","category-machine-learning"],"_links":{"self":[{"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2097","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2097"}],"version-history":[{"count":10,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2097\/revisions"}],"predecessor-version":[{"id":2108,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2097\/revisions\/2108"}],"wp:attachment":[{"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2097"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2097"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hhhh.org\/joeboy\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2097"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}