Blog: Analyzing Machine Learning Models with Yellowbrick

Ok, not quite yellow bricks—but gold!

Visual diagnostics are vital to Machine learning.

Anscombe’s quartet demonstrates a very significant idea: we need to visualize data before analyzing it. The quartet consists of four hypothetical datasets each containing eleven data points. Whereas all these datasets have essentially the same descriptive statistics including the mean, variance, correlation, and regression line, they have very different distributions when graphed.

This is a classic example that reiterates the fact that looking at data is as important as performing numerical computations on it. Although statistical tests are important and mostly necessary for analyzing datasets, so is visual analysis.

Visualization thus has a critical role to play throughout the analytical process and is a, frankly, a must-have for any effective analysis, for model selection, and for evaluation. This article aims to discuss a diagnostic platform called Yellowbrick that allows data scientists to visualize the entire model selection process to steer us towards better, more explainable models—and avoid pitfalls and traps along the way.

Model Selection Process

It’s been seen that more often than not, machine learning relies primarily on the models being used for inference. Practitioners have their favorites when it comes to model selection. This preference is built over time through experience and knowledge, but what actually happens under the hood is often not given enough importance.

What is important to note is that the model selection process isn’t all about picking the “right” or “wrong” algorithm—it’s actually much deeper and an iterative process that involves the following steps:

  1. Selecting and/or engineering the smallest and most predictive feature set.
  2. Choosing a set of algorithms from a model family.
  3. Tuning the algorithm hyperparameters to optimize performance.

All of the above points together constitute the Model Selection Triple, which was first discussed in a 2015 SIGMOD¹ paper by Kumar et al.

“Model selection is iterative and exploratory because the space of [model selection triples] is usually infinite, and it is generally impossible for analysts to know a priori which [combination] will yield satisfactory accuracy and/or insights.”

The Yellowbrick library is a diagnostic visualization platform for machine learning that allows data scientists to steer the model selection process and assist in diagnosing problems throughout the machine learning workflow. In short, it tries to find a model described by a triple composed of features, an algorithm, and hyperparameters that best fit the data.


Yellowbrick is an open source, Python project that extends the scikit-learn API with visual analysis and diagnostic tools. The Yellowbrick API also wraps matplotlib to create interactive data explorations.

It extends the scikit-learn API with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the scikit-learn pipeline process, providing visuals throughout the transformation of high-dimensional data.


Yellowbrick isn’t a replacement for other data visualization libraries but helps to achieve the following:

  • Model Visualization
  • Data visualization for machine learning
  • Visual Diagnostics
  • Visual Steering


Yellowbrick can either be installed through pip or through conda distribution. For detailed instructions, you may want to refer the documentation.

#via pip
pip install yellowbrick
#via conda
conda install -c districtdatalabs yellowbrick


The Yellowbrick API should appear easy if you are familiar with the scikit-learn interface.

The primary interface is a Visualizer – an object that learns from data to produce a visualization. In order to use visualizers, import the visualizer, instantiate it, call the visualizer’s fit() method, and then, in order to render the visualization, call the visualizer’s poof() method, which does the magic!


Yellowbrick hosts several datasets wrangled from the UCI Machine Learning Repository. We’ll be working with the ‘occupancy’ dataset. It’s an experimental dataset used for binary classification wherein the idea is to predict room occupancy given the variables such as Temperature, Humidity, Light and CO2. You can download the dataset here.

Importing the necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#To avoid warnings
import warnings

Loading the dataset

df = pd.read_csv('data/occupancy/occupancy.csv')

Specifying the feature and target column

feature_names = [
'temperature', 'relative humidity', 'light', 'C02',
target_name = 'occupancy'
X = df[feature_names]
y = df[target_name]

Spend less time searching and more time building. Sign up for a weekly dive into the biggest news, best tutorials, and most interesting projects from the deep learning world.

Feature Analysis with Yellowbrick

Essentials of Feature Analysis²

Feature engineering requires an understanding of the relationships between features—hence the feature analysis visualizers in Yellowbrick help to visualize the data in space so that important features can be detected.

The visualizers focus on aggregation, optimization, and other techniques to give overviews of the data. The following feature analysis visualizers have been implemented in Yellowbrick currently:

Feature analysis visualizers in Yellowbrick

Let’s go through some of them to see how they’re implemented.

Rank Features

Rank Features rank single and pairs of features to detect covariance. Ranking can be 1D or 2D depending on the number of features utilized for ranking.

Rank 1D

Rank 1D utilizes a ranking algorithm that takes into account only a single feature at a time. By default, the Shapiro-Wilk algorithm is used to assess the normality of the distribution of instances with respect to the feature:

from yellowbrick.features import Rank1D
# Instantiate the 1D visualizer with the Sharpiro ranking algorithm
visualizer = Rank1D(features=feature_names, algorithm='shapiro'), y) # Fit the data to the visualizer
visualizer.transform(X) # Transform the data
visualizer.poof() # visualise
A barplot is drawn showing the relative ranks of each feature.

Rank 2D

Rank 2D, on the other hand, performs pairwise feature analysis as a heatmap. The default ranking algorithm is covariance, but we can also use the Pearson score:

from yellowbrick.features import Rank2D
# Instantiate the visualizer with the Pearson ranking algorithm
visualizer = Rank2D(features=feature_names, algorithm='covariance')
#visualizer = Rank2D(features=feature_names, algorithm='pearson'), y) 


RadViz is a multivariate data visualization algorithm that plots each feature dimension uniformly around the circumference of a circle and then plots data points on the interior of the circle. This allows many dimensions to easily fit on a circle, greatly expanding the dimensionality of the visualization:

from yellowbrick.features import RadViz
# Specify the features of interest and the classes of the target 
features = feature_names
classes = ['unoccupied', 'occupied']
# Instantiate the visualizer
visualizer = visualizer = RadViz(classes=classes, features=features,size = (800,300)), y) 

Parallel Coordinates

This technique is useful when we need to detect clusters of instances that have similar classes, and to note features that have high variance or different distributions. Points that tend to cluster will appear closer together:

from yellowbrick.features import ParallelCoordinates
features = feature_names
classes = ['unoccupied', 'occupied']
# Instantiate the visualizer
visualizer = visualizer = ParallelCoordinates(
classes=classes, features=features,
normalize='standard', sample=0.1, size = (800,300)
), y) 

Parallel coordinates is a visualization technique used to plot individual data elements across many dimensions. Each of the dimensions corresponds to a vertical axis, and each data element is displayed as a series of connected points along the dimensions/axes.

The groups of similar instances are called ‘braids’, and when there are distinct braids of different classes, it suggests there’s enough separability that a classification algorithm might be able to discern between each class.

Model Evaluation Visualizers

Essentials of Algorithm Selection²

Model evaluation signifies how well the values predicted by the model match the actual labeled ones. Yellowbrick has visualizers for classification, regression, and clustering algorithms. Let’s see a select few.

Evaluating Classifiers

Classification models try to assign the dependent variables one or more categories. The sklearn.metrics module implements a function to measure classification performance.

Yellowbrick implements the following classifier evaluations:

Classification Visualizers in Yellowbrick

Let’s implement a few of them on our data:

# Classifier Evaluation Imports
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from yellowbrick.classifier import ClassificationReport,ConfusionMatrix

Split the dataset into training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Classification Report

The classification report visualizer displays the precision, recall, and F1 scores for the model:

precision = true positives / (true positives + false positives)
recall = true positives / (false negatives + true positives)
F1 score = 2 * ((precision * recall) / (precision + recall))

Let’s visualize classification reports for two models to decide which is better.

  • Classification report using Gaussian NB
# Instantiate the classification model and visualizer 
bayes = GaussianNB()
visualizer = ClassificationReport(bayes, classes=classes), y_train) 
visualizer.score(X_test, y_test)
g = visualizer.poof()
  • Classification report using Logistic Regression
bayes = LogisticRegression()
visualizer = ClassificationReport(bayes, classes=classes), y_train) 
visualizer.score(X_test, y_test)
g = visualizer.poof()

Visual classification reports are used to compare classification models to select models that are “redder”, e.g. have stronger classification metrics or that are more balanced.

Confusion Matrix

The ConfusionMatrix visualizer displays the accuracy score of the model, i.e. it shows how each of the predicted classes compares to their actual classes. Let’s check out the confusion matrix for the Logistic Regression Model:

logReg = LogisticRegression()
visualizer = ConfusionMatrix(logReg), y_train) 
visualizer.score(X_test, y_test)
g = visualizer.poof()

Evaluating Regressors

Regression models try to predict a target in a continuous space. The sklearn.metrics module implements function to measure classification performance.

Yellowbrick implements the following regressor evaluation methods:

Regression Visualizers

For implementing the regressor visualizer, let’s quickly import a regression dataset. We’ll use the concrete dataset, which contains 1030 instances and 9 attributes. Eight of the attributes are explanatory variables, including the age of the concrete and the materials used to create it, while the target variable strength is a measure of the concrete’s compressive strength (MPa). Download the dataset here.

Loading the dataset

concrete_data = pd.read_csv('concrete.csv')

Saving feature names as a list and target variable as a string:

feature_names_reg = ['cement', 'slag', 'ash', 'water', 'splast', 'coarse', 'fine', 'age']
target_name_reg = 'strength'
# Get the X and y data from the DataFrame 
X_reg = concrete_data[feature_names_reg]
y_reg = concrete_data[target_name_reg]
# Create the train and test data 
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.2)

Residuals Plot

A residual is a difference between the target and predicted values, i.e. the error of the prediction. The ResidualsPlot Visualizer shows the difference between residuals on the vertical axis and the dependent variable on the horizontal axis, allowing you to detect regions within the target that may be susceptible to more or less error. It also enables visualizing the train and test data with different colors.

model = Ridge()
visualizer = ResidualsPlot(model), y_reg_train) 
visualizer.score(X_reg_test, y_reg_test)
g = visualizer.poof()

If the points are well dispersed around the horizontal dark line, this means linear regression will work well on the data; otherwise, non-linear will work better. The above example shows that the data is pretty uniformly distributed.

Prediction Error Plot

The Prediction Error Visualizer visualizes prediction errors as a scatterplot of the predicted and actual values. We can then visualize the line of best fit and compare it to the 45º line.

Alpha Selection Visualizer

The AlphaSelection Visualizer demonstrates how different alpha values influence model selection during the regularization of linear models. A higher alpha value denotes a less complex model, and vice versa, decreasing the error due to variance (overfit).

However, alphas that are too high increase the error due to bias (underfit). Therefore, it’s important to choose an optimal alpha so that the error is minimized in both directions.

import numpy as np
from sklearn.linear_model import LassoCV
from yellowbrick.regressor import AlphaSelection
# Create a list of alphas to cross-validate against
alphas = np.logspace(-10, 1, 400)
model = LassoCV(alphas=alphas)
visualizer = AlphaSelection(model), y_reg_train)
g = visualizer.poof()

We can experiment with Lasso, Ridge, and ElasticNet and see which has an optimum alpha value.

Hyperparameter Tuning

Essentials of Hyperparameter Tuning²

Tuning a model is as important as model selection. One of the ways you can use Yellowbrick for hyperparameter tuning apart from the alpha selection includes:

Silhouette Visualizer

The Silhouette Visualizer displays the silhouette coefficient for each sample on a per-cluster basis, visualizing which clusters are dense and which are not.

Apart from this, there are a bunch of other visualizer APIs that can be used for tuning, like the Elbow method, which is also widely used. But for the sake of demonstration, we’ll stick with just this one method.


The code and the datasets used in this article are available on my GitHub Repository.


The Yellowbrick library allows data scientists to steer the model selection process. The fact that it extends the scikit-learn API lowers the learning curve considerably. This can help in understanding a large variety of algorithms and methods and in monitoring model performance in real-world applications.

Editor’s Note: Join Heartbeat on Slack and follow us on Twitter and LinkedIn for the all the latest content, news, and more in machine learning, mobile development, and where the two intersect.

Source: Artificial Intelligence on Medium

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top