ProjectBlog: The Two Things I Wish Everyone Knew about Machine Learning

Blog: The Two Things I Wish Everyone Knew about Machine Learning

Since I entered the world of machine learning nearly five years ago, I have been exposed to a wide variety of problems across a number of different industries spanning health care, software, manufacturing, and telecom. As part of this work, I often find myself in the position of explaining some of the challenges of building and delivering machine learning models. Fortunately, I am usually surrounded by smart, motivated technologists who are curious about machine learning, with a desire to learn more about this field. Unfortunately, machine learning and artificial intelligence are overhyped and poorly understood, and the potential for confusion in this space is extremely high. As a result, I often find myself answering the same questions or re-explaining the same topics. In these conversations, I find that two points of confusion persist more than many others, and so I want to try and offer some clarification here.

Point of Confusion #1: Training vs. Inference

Otherwise known as: There is a difference between building a model and using it.

Before I delve into this first point of confusion, I want to start with some vocabulary. When I say the words “building” or “training” in this context, they mean the same thing. In the machine learning space, these two words can usually be used interchangeably. So what do they mean, exactly?

Building (or training) a model means taking existing data — it can be from a SQL database, thousands of images, a collection of tweets, or a combination of many sources — cleaning it, running it through an algorithm, and constructing “a model” with it. This process is typically iterative, time-consuming, requires a lot of mucking around in the data to see if it is usable, and involves asking questions like “Why does this database have two really similar but not quite identical columns?” to which the answer is often “We aren’t sure, and the database architect who added both of these columns retired in 2003 and we don’t know where the documentation is.”

That being said, at the end, you are rewarded with a model that predicts something. In code, this model will be an object, or returned by a function, or an artifact you can save and export.

Below is a snippet of code that uses TuriCreate to train an image classification model, and creates an object called model.

# Train an image classification model
model = tc.image_classifier.create(

The next code snippet saves that model as a CoreML object (or artifact) called ImageClassifier.mlmodel

# Save that model as a CoreML object
coreml_model_name = 'ImageClassifier.mlmodel'
res = model.export_coreml(coreml_model_name)

I’ve skipped a lot of really important nuance about this process, but the salient point is this: you’ve wrangled some data (train_data), run it through an algorithm (tc.image_classifier.create), and you have a THING. That’s training. Want to try it yourself? Here are a few example Google Colab notebooks to get you started.

So…what is inference? This is the part where you take the thing you’ve just built, and give it some new data. So for example, let’s imagine I’ve built an image classification model that identifies whether or not something is a cat or a dog. The code above might be a part of this process. Now, I want to use this model I’ve just created to tell me whether or not a new photo I’ve taken is a cat or a dog. This is called inference. Below is a screen capture from my Google Colab notebook, to demonstrate what this might look like in code:

and here is what a user might see in an iOS app that uses this model.

For inference, I didn’t need a lot of data (that’s training) or compute power (again, training). I just need my model object (ImageClassifier.mlmodel), I need to give it some new data (new_image), and I need to be able to access the result, whether in Python code or an iOS app.

Training is the process of building the thing (i.e. a predictive model) that I’ll then use to do inference on a new piece of data.

Point of Confusion #2: You usually can’t use a pre-trained model on your data.

I also like to refer to this point as: A model doesn’t know about anything unless it has seen it before (during training).

Even with the advent of transfer learning, training a predictive model takes a long time. We have a plethora of tools and technologies that simplify this process, but even so, the process of wrangling data into the right format and generating a performant model to solve a specific problem is really not fast [Link to 80% statistic]. It is completely understandable to want to speed this process up. Unfortunately (or perhaps fortunately), in order to get a model to predict what you want, you need to train (or build) it using a relevant data source.

Here’s an example: let’s imagine you’ve built a predictive model that takes an image and classifies it as a cat or a dog. It might work as shown below.

Image of a predictive model classifying cats and dogs. Courtesy of Turi Create:

In the building process, your training data is comprised of images of cats and dogs, and each image is labeled as (wait for it) ‘cat’ or ‘dog’. That’s it. This model only knows about cats and dogs. You can take a photo of any object on earth, and it will identify that photo as a cat or a dog. Here are some examples of what this might look like embedded in an app:

Clearly, this is not a dog. Does this mean that your carefully constructed model is bad? Not even a little bit. It simply means that an image classification model does not classify any old type of image you might imagine. It classifies the images that it was trained on. It also means that if you want your model to detect horses, cows, and pigs, or pens, computers, and tables, you need to go back to the beginning with a new set of training data, and re-build your model. This is why pre-built models, while fantastic to get you up and running, rapidly become un-usable, because they aren’t trained on your data or designed to solve your specific problem.

Here is another example. At Skafos, we’ve built several different predictive models that leverage text analytics. One of them is a sentiment classifier built on Yelp reviews. As shown below, this model will predict, based on the text you provide it, whether or not someone has left a positive (5 stars) or negative (1 star) review for a product or service on Yelp.

# Here is the training of the model
model = tc.text_classifier.create(
# Here is a prediction
example_text = {"text": ["I really love it. It filled me with joy and was super awesome."]}
example_prediction = model.classify(tc.SFrame(example_text))
| class | probability | 
| 5 | 0.8412655067800252 |

Great, right? Now what happens when I use it on some text that has nothing to do with a product review, such as a note on a medical insurance claim form.

example_text = {"text": ["The patient presented with the following symptoms."]}
example_prediction = model.classify(tc.SFrame(example_text))
| class | probability | 
| 5 | 0.36933508677023275 |

This relatively neutral text also received a 5-star rating, but with a very low probability. This doesn’t tell us anything useful. It’s the same principle I described above when talking about image classification. This particular sentiment model was specifically trained on Yelp review data. Yelp reviews have a type of grammar, syntax, and style that is different from other types of writing, be that a text message shared with friends, medical notes, or a more formal business document. This may not even be the right type of text analytics for the problem at hand. Just as an image classifier trained on images of cats and dogs won’t successfully identify horses or rabbits, a sentiment analysis model trained on Yelp reviews won’t be able to tell you whether or not your business proposal is likely to be approved. Different types of source data require different types of models.


Machine learning is a fascinating field filled with challenges and interesting problems. It has the potential to change, for the better, the way that we solve problems and think about the world around us. But machine learning is not a magic bullet that solves every problem. To truly derive value from machine learning, it’s important to understand that it is a tool that is only useful when used correctly. Yes, you can use a wrench (pre-trained model) to solve a problem that really needs a hammer (a model built on your data for your use-case), but you are going to have much better results if you learn how to use the hammer in the first place.

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top

Display your work in a bold & confident manner. Sometimes it’s easy for your creativity to stand out from the crowd.