Categories
Machine Learning Programming

Structuring TensorFlow Models

Danijar Hafner has an excellent tutorial on Structuring Your TensorFlow Models. I really liked it so I am posting it here so its always easy to find. He is using Python’s decorators to provide a lazy initialization for the graph components. Its a neat trick. Totally separate from TensorFlow, I am actively working on improving my Python programming and will definitely be using this approach in the future.

Categories
Down wind of the future Machine Learning

DeepFakes and Jordan Peele’s Obama PSA

This is an awesome PSA about information hygene and the problems that our society is about to face. Basically, breakthroughs in the last few years have been accelerating what you can do with machine learning at a break-neck pace. Since people can kind of suck, not all of that advancement has been for the benefit of society at large.

In late 2017 some equally brilliant and creepy work was released letting someone (super creepy) place the face of anyone for whom they could get a few hundred pictures on a porn star’s body. Basically this was the creation of a new, modern, sex crime. No one wanted to talk about this threat, because – porn. Well, thankfully Jordan Peele and some others put out a PSA about this coming threat. As you can see in this video they have generated video of “President Obama”, saying all sorts of things. The video effectively nailed generation of voice, tone, intonation, gesture, and lighting. Its an awesome fake.

Right know, with a little knowledge about how the video was generated, it is fairly easy to prove its is not authentic. That illustrates the problem though, you quickly start needing to use math to prove that the video is faked. Math is also quickly becoming inherently distrusted by more and more people. We are also not that far from this tech being able to be run reliably in real time. So – generation of a video of any public figure believably saying anything you want them to is not that far off.

The obvious threat is anyone caught committing an undesirable act will soon be able to more believably cry “fake news”. Thats the simpler problem though. The real problem will kick off as soon as people start generating revisionist historical records. That thread – when pulled – could unravel our cultural anchor to the past, changing our understanding of our cultural and societal path to now.

Categories
Machine Learning

Data sets for Machine Learning

Grad school was a few years ago, and things are moving fast, so I am writing a bunch of small ML programs to get current with the state of the art again. Also switching form using C++ to Python for ML work. Wow, thats a huge improvement right there – can’t believe I waited this long.

The biggest problem in writing ML code is finding decent data sets. The UCI Machine Learning Repository has links to a bunch of curated data sets. Posting it here, like the rest of the ML stuff I’m about to, so its easy to find and point friends to.

Categories
Machine Learning

MIT book on Deep Learning

MIT open courseware has made their Deep Learning book available for free online. Posting to its easy to find in the future.
http://www.deeplearningbook.org

Categories
Data Machine Learning

Decision Tree Classifiers – A simple example

Here is a simple “Machine Learning” Python program using scikit-learn’s DecisionTree classifier to use height and weight to predict your body type. For the record – this is why people hate BMI and things like it. After writing this I think I need to go on a diet.

Identification Trees – often called decision trees – provide a way to deterministically map a bunch of qualitative observations into predictions. Basically the predictions are a set of observed output states, and we are looking for observable features, inputs, that we can use in a tree of tests.

Training builds the decision tree from two sets of data, our set of observations and a set of labels corresponding to each of the observations. Each node in the tree represents a test that cuts the training set with a number of cuts – the results of each of those cuts going on to either subsequent tests, or to a leaf node representing a specific output label or state.

The MIT open courseware video Identification Trees and Disorder is a good introduction.

So lets say we wanted to determine based on someone’s height and weight if they were overweight or not. To get some training data we could take bunch of random samples of a representative population of people – ask them their height and weight and then create labels for each person determining if they were of a normal weight, overweight, or obese. That would not be a fun data set to try and collect – so lets cheat.

The Body Mass Index or BMI is an equation already derived from population health data that roughly maps height and weight into a number, the BMI. The BMI can be used to predict if a person is under weight, normal weight, and over weight, or obese. The BMI equation is roughly BMI = [(weight in pounds * 703)/(height in inches squared)]. A BMI of less than 18.5 are underweight, BMIs between 19 and 25 reflect a normal weight, a BMI of 25-30 correspond to being overweight, and a BMI over 30 signals obesity. So BMI equations let us build a table mapping height and weight to a table that would be representative of uniform sampling of a large population.

So using BMI sampling data here is a simple Python program using sklearn’s DecisionTree classifier to tell you if you are obese, overweight, or normal weight.


from sklearn import tree

BMI_features = [ “NOR”, “NOR”, … lots of data here … , “OBE”, “OBE”]
Height_in_Weight_lbs_samples = [[91,58],[96,58], … lots of data here … ,[279,76],[287,76]]

# Create identification tree from BMI table.
clf = tree.DecisionTreeClassifier()
clf = clf.fit(Height_in_Weight_lbs_samples, BMI_features)

looping = True
while( looping ):
weight = input(“Enter your weight in lbs: “)
if not weight:
break

height = input(“Enter your height in inches: “)
if not height:
break

prediction = clf.predict([[weight,height]])
print(“It appears that you are:”, prediction, “\r\n” )

Output of the program looks something like this. Yeah, I’m regretting both dessert and choosing this example.


Enter your weight in lbs: 225
Enter your height in inches: 70
It appears that you are: [‘OBE’]

Enter your weight in lbs: 175
Enter your height in inches: 70
It appears that you are: [‘OVE’]

Enter your weight in lbs: 168
Enter your height in inches: 70
It appears that you are: [‘NOR’]

Enter your weight in lbs:

Sometimes it can be useful to look directly at the generated decision tree. This code generates a visualization of the tree.


# Generate a graph visualizing the trained decesion tree.
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None)
graph = graphviz.Source(dot_data)
graph.render(“BMI_Table”, view=True)

I put this code, including the full data sets for training up at: git@github.com:aarontoney/Machine_Learning_Examples.git