Blog: Detect Electric Power Spikes With C# and ML.NET Machine Learning
In this article, I’m going to use C#, NET Core, and ML.NET to detect anomalous spikes and trend changes in a dataset.
In the world of machine learning, this type of task is called Time Series Anomaly Detection.
ML.NET is Microsoft’s new machine learning library. It can run linear regression, logistic classification, clustering, deep learning, and many other machine learning algorithms.
And NET Core is the Microsoft multi-platform NET Framework that runs on Windows, OS/X, and Linux. It’s the future of cross-platform NET development.
The first thing I need for my app is a data file to analyze. I’ll use a power consumption file from a smart electricity meter. This is a simple data set that contains electricity consumption measurements over a 90-day period.
Here’s what the dataset looks like:
It’s a CSV file with three columns:
- The measurement type (always ‘electricity’)
- The date and time of the measurement
- The (normalized) power consumption
I will build a machine learning model that reads in each consumption record, and then identifies any power anomalies in the data.
Let’s get started. Here’s how to set up a new console project in NET Core:
$ dotnet new console -o Power
$ cd Power
Next, I need to install the ML.NET base package, the time series extensions, and a plotting library:
$ dotnet add package Microsoft.ML
$ dotnet add package Microsoft.ML.TimeSeries
$ dotnet add package plplot
Now I’m ready to add some classes. I’ll need one to hold a power consumption record, and one to hold my model’s predictions.
I will modify the Program.cs file like this:
The MeterData class holds one single sales record. Note how each field is tagged with a LoadColumn attribute that tell the CSV data loading code which column to import data from.
I’m also declaring an SpikePrediction class which will hold a single power spike prediction. Note that the prediction field is tagged with a VectorType attribute that tells ML.NET that each prediction will consist of 3 numbers.
Now I’m going to load the consumption data in memory:
This code uses the method LoadFromTextFile to load the CSV data directly into memory. The class field annotations tell the method how to store the loaded data in the MeterData class.
The data is now stored in a data view, but I want to work with the sales records directly. So I’m calling CreateEnumerable() to convert the data view to an enumeration of MeterData instances.
Let’s start by plotting the power consumption to get an idea of what my data looks like. I’ll add the following code:
The sdev() and sfnam() methods set up a PNG output file, and the spal0() method selects the default color palette. Then I call env() to set up the x- and y-axes, and lab() to specify the axis labels and plot title.
The line() method draws a line from two array of x- and y-coordinates. My code uses two LINQ queries to provide a range of 0..90 for x (because I have 90 days of data), and the corresponding consumption numbers for y.
Finally, the final eop() method closes the plot and saves it to disk.
When I run the app, I get this:
You can clearly see that there’s a huge power spike on day 67.
Now I will identify all anomalies in the data. I will use an ML.NET algorithm called SSA Spike Estimator.
‘SSA’ stands for Singular Spectrum Analysis. It’s a somewhat complex analysis method that combines bits and pieces of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems, and signal processing. It’s basically the Swiss-army knife of anomaly detection.
Here’s how to perform singular spectrum analysis in ML.NET.
First remove the pl.eop() call, and then add the following code:
Machine learning models in ML.NET are built with pipelines, which are sequences of data-loading, transformation, and learning components.
My pipeline has only one component:
- SsaSpikeEstimator which reads the consumption records and estimates all power anomalies in the data. I have to provide the input and output column names, a confidence threshold, and the size of the sliding window, the training window, and the seasonality window used during estimation.
I’m configuring my spike estimator as follows:
- A confidence for spike prediction of at least 98%
- A sliding window size of 30 for computing the p-value. This is 1/3 of the data set width.
- I will use all 90 data points in the set for training the estimator.
- I’ll use an upper-bound of 30 days for my seasonality window.
With the pipeline fully assembled, I can train the model on the data with a call to Fit(…) and then call Transform(…) to make spike predictions for every sales record in the data set.
Finally I call CreateEnumerable() to convert my transformed variable to an enumeration of SpikePrediction instances.
Each sales prediction instance now holds a vector with three values:
- An ‘alert’ value that is equal to 1 for a power spike (that exceeded the specified threshold) and 0 otherwise.
- The predicted consumption value.
- The p-value. This is a metric between zero and one. The lower the value, the larger the probability that we’re looking at an anomaly.
I can highlight the spikes in the plot with the following code:
I use a LINQ query to select all spike predictions with the ‘alert’ value equal to 1, and call the pl.string2() method to highlight these locations in the graph with a down-arrow symbol.
Here’s what that looks like. When I run the app now, I get the following plot:
The algorithm has discovered the single power anomaly in the data.
So what do you think? Are you ready to start writing C# machine learning apps with ML.NET?