a

Lorem ipsum dolor sit amet, consectetur adicing elit ut ullamcorper. leo, eget euismod orci. Cum sociis natoque penati bus et magnis dis.Proin gravida nibh vel velit auctor aliquet. Leo, eget euismod orci. Cum sociis natoque penati bus et magnis dis.Proin gravida nibh vel velit auctor aliquet.

  /  Analytics   /  Time Series Machine Learning (and Feature Engineering) in R

Time Series Machine Learning (and Feature Engineering) in R

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Machine learning is a powerful way to analyze Time Series. With innovations in the tidyverse modeling infrastructure (tidymodels), we now have a common set of packages to perform machine learning in R. These packages include parsnip, recipes, tune, and workflows. But what about Machine Learning with Time Series Data? The key is Feature Engineering. (Read the updated article at Business Science)

The timetk package has a feature engineering innovation in version 0.1.3. A recipe step called step_timeseries_signature() for Time Series Feature Engineering that is designed to fit right into the tidymodels workflow for machine learning with timeseries data.

The small innovation creates 25+ time series features, which has a big impact in improving our machine learning models. Further, these “core features” are the basis for creating 200+ time-series features to improve forecasting performance. Let’s see how to do Time Series Machine Learning in R.

Time Series Feature Engineering
with the Time Series Signature



Use feature engineering with timetk to forecast

The time series signature is a collection of useful engineered features that describe the time series index of a time-based data set. It contains a 25+ time-series features that can be used to forecast time series that contain common seasonal and trend patterns:

  • ✅ Trend in Seconds Granularity: index.num

  • ✅ Yearly Seasonality: Year, Month, Quarter

  • ✅ Weekly Seasonality: Week of Month, Day of Month, Day of Week, and more

  • ✅ Daily Seasonality: Hour, Minute, Second

  • ✅ Weekly Cyclic Patterns: 2 weeks, 3 weeks, 4 weeks

We can then build 200+ of new features from these core 25+ features by applying well-thought-out time series feature engineering strategies.

Time Series Forecast Strategy
6-Month Forecast of Bike Transaction Counts

In this tutorial, the user will learn methods to implement machine learning to predict future outcomes in a time-based data set. The tutorial example uses a well known time series dataset, the Bike Sharing Dataset, from the UCI Machine Learning Repository. The objective is to build a model and predict the next 6-months of Bike Sharing daily transaction counts.

Feature Engineering Strategy

I’ll use timetk to build a basic Machine Learning Feature Set using the new step_timeseries_signature() function that is part of preprocessing specification via the recipes package. I’ll show how you can add interaction terms, dummy variables, and more to build 200+ new features from the pre-packaged feature set.

Machine Learning Strategy

We’ll then perform Time Series Machine Learning using parsnip and workflows to construct and train a GLM-based time series machine learning model. The model is evaluated on out-of-sample data. A final model is trained on the full dataset, and extended to a future dataset containing 6-months to daily timestamp data.

Time Series Forecast using Feature Engineering

Time Series Forecast using Feature Engineering

How to Learn Forecasting Beyond this Tutorial

I can’t possibly show you all the Time Series Forecasting techniques you need to learn in this post, which is why I have a NEW Advanced Time Series Forecasting Course on its way. The course includes detailed explanations from 3 Time Series Competitions. We go over competition solutions and show how you can integrate the key strategies into your organization’s time series forecasting projects. Check out the course page, and Sign-Up to get notifications on the Advanced Time Series Forecasting Course (Coming soon).


Need to improve forecasting at your company?

I have the Advanced Time Series Forecasting Course (Coming Soon). This course pulls forecasting strategies from experts that have placed 1st and 2nd solutions in 3 of the most important Time Series Competitions. Learn the strategies that win forecasting competitions. Then apply them to your time series projects.

Join the waitlist to get notified of the Course Launch!

Join the Advanced Time Series Course Waitlist


Prerequisites

Please use timetk 0.1.3 or greater for this tutorial. You can install via remotes::install_github("business-science/timetk") until released on CRAN.

Before we get started, load the following packages.

library(workflows)
library(parsnip)
library(recipes)
library(yardstick)
library(glmnet)
library(tidyverse)
library(tidyquant)
library(timetk) # Use >= 0.1.3, remotes::install_github("business-science/timetk")

Data

We’ll be using the Bike Sharing Dataset from the UCI Machine Learning Repository. Download the data and select the “day.csv” file which is aggregated to daily periodicity.

# Read data
bikes <- read_csv("2020-03-18-timeseries-ml/day.csv")
# Select date and count
bikes_tbl <- bikes %>%
select(dteday, cnt) %>%
rename(date = dteday,
value = cnt)

A visualization will help understand how we plan to tackle the problem of forecasting the data. We’ll split the data into two regions: a training region and a testing region.

# Visualize data and training/testing regions
bikes_tbl %>%
ggplot(aes(x = date, y = value)) +
geom_rect(xmin = as.numeric(ymd("2012-07-01")),
xmax = as.numeric(ymd("2013-01-01")),
ymin = 0, ymax = 10000,
fill = palette_light()[[4]], alpha = 0.01) +
annotate("text", x = ymd("2011-10-01"), y = 7800,
color = palette_light()[[1]], label = "Train Region") +
annotate("text", x = ymd("2012-10-01"), y = 1550,
color = palette_light()[[1]], label = "Test Region") +
geom_point(alpha = 0.5, color = palette_light()[[1]]) +
labs(title = "Bikes Sharing Dataset: Daily Scale", x = "") +
theme_tq()

plot of chunk unnamed-chunk-3

Split the data into train and test sets at “2012-07-01”.

# Split into training and test sets
train_tbl <- bikes_tbl %>% filter(date  ymd("2012-07-01"))
test_tbl <- bikes_tbl %>% filter(date >= ymd("2012-07-01"))

Modeling

Start with the training set, which has the “date” and “value” columns.

# Training set
train_tbl
## # A tibble: 547 x 2
##    date       value
##    

Read the Full Article here: >R-bloggers

(Visited 1 times, 1 visits today)
Post a Comment

Newsletter