Categories
Data Machine Learning

Decision Tree Classifiers – A simple example

Here is a simple “Machine Learning” Python program using scikit-learn’s DecisionTree classifier to use height and weight to predict your body type. For the record – this is why people hate BMI and things like it. After writing this I think I need to go on a diet.

Identification Trees – often called decision trees – provide a way to deterministically map a bunch of qualitative observations into predictions. Basically the predictions are a set of observed output states, and we are looking for observable features, inputs, that we can use in a tree of tests.

Training builds the decision tree from two sets of data, our set of observations and a set of labels corresponding to each of the observations. Each node in the tree represents a test that cuts the training set with a number of cuts – the results of each of those cuts going on to either subsequent tests, or to a leaf node representing a specific output label or state.

The MIT open courseware video Identification Trees and Disorder is a good introduction.

So lets say we wanted to determine based on someone’s height and weight if they were overweight or not. To get some training data we could take bunch of random samples of a representative population of people – ask them their height and weight and then create labels for each person determining if they were of a normal weight, overweight, or obese. That would not be a fun data set to try and collect – so lets cheat.

The Body Mass Index or BMI is an equation already derived from population health data that roughly maps height and weight into a number, the BMI. The BMI can be used to predict if a person is under weight, normal weight, and over weight, or obese. The BMI equation is roughly BMI = [(weight in pounds * 703)/(height in inches squared)]. A BMI of less than 18.5 are underweight, BMIs between 19 and 25 reflect a normal weight, a BMI of 25-30 correspond to being overweight, and a BMI over 30 signals obesity. So BMI equations let us build a table mapping height and weight to a table that would be representative of uniform sampling of a large population.

So using BMI sampling data here is a simple Python program using sklearn’s DecisionTree classifier to tell you if you are obese, overweight, or normal weight.


from sklearn import tree

BMI_features = [ “NOR”, “NOR”, … lots of data here … , “OBE”, “OBE”]
Height_in_Weight_lbs_samples = [[91,58],[96,58], … lots of data here … ,[279,76],[287,76]]

# Create identification tree from BMI table.
clf = tree.DecisionTreeClassifier()
clf = clf.fit(Height_in_Weight_lbs_samples, BMI_features)

looping = True
while( looping ):
weight = input(“Enter your weight in lbs: “)
if not weight:
break

height = input(“Enter your height in inches: “)
if not height:
break

prediction = clf.predict([[weight,height]])
print(“It appears that you are:”, prediction, “\r\n” )

Output of the program looks something like this. Yeah, I’m regretting both dessert and choosing this example.


Enter your weight in lbs: 225
Enter your height in inches: 70
It appears that you are: [‘OBE’]

Enter your weight in lbs: 175
Enter your height in inches: 70
It appears that you are: [‘OVE’]

Enter your weight in lbs: 168
Enter your height in inches: 70
It appears that you are: [‘NOR’]

Enter your weight in lbs:

Sometimes it can be useful to look directly at the generated decision tree. This code generates a visualization of the tree.


# Generate a graph visualizing the trained decesion tree.
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None)
graph = graphviz.Source(dot_data)
graph.render(“BMI_Table”, view=True)

I put this code, including the full data sets for training up at: git@github.com:aarontoney/Machine_Learning_Examples.git

Leave a Reply