a

Lorem ipsum dolor sit amet, consectetur adicing elit ut ullamcorper. leo, eget euismod orci. Cum sociis natoque penati bus et magnis dis.Proin gravida nibh vel velit auctor aliquet. Leo, eget euismod orci. Cum sociis natoque penati bus et magnis dis.Proin gravida nibh vel velit auctor aliquet.

  /  Project   /  Blog: Implementing and Visualizing Linear Regression in Python with SciKit Learn

Blog: Implementing and Visualizing Linear Regression in Python with SciKit Learn


Photo by Kevin Ku on Unsplash

Ladies and gentlemen, fasten your seatbelts, lean back and take a deep breath, for we are going to go on a bumpy ride!

Now, before you shoo me away for corny intros, let us delve deep right into the magical world of data science.

Firstly, do not be afraid, for we are not going to learn about algorithms filled with mathematical formulas which whoosh past right over your head. Instead, as mentioned in the title, we will take the help of SciKit Learn library, with which we can just call the required packages and get our results.

Easy, peasy.

But that doesn’t mean you do not need any knowledge of how these algorithms work from the inside. At one point or another, you do need to learn them, for you cannot avoid them forever. But we will discuss them some other day, so let’s focus on the task at hand here.

Implementation

First, we import a few libraries-

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Assuming that you know about numpy and pandas, I am moving on to Matplotlib, which is a plotting library in Python. Basically, this is the dude you want to call when you want to make graphs and charts.

The next step is to import our dataset (‘sample.csv’) and then split them into input (independent) variables and output (dependent) variable.

dataset = pd.read_csv('sample.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

When you deal with real datasets, you usually have around thousands of rows but since the one I have taken here is a sample, this has just 30 rows. So when we split our data into a training set and a testing set, we split it in 1/3, i.e., 20 rows go into the training set and the rest 10 make it to the testing set.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 1/3)

Now, we will import the linear regression class, create an object of that class, which is the linear regression model.

from sklearn.linear_model import LinearRegression
lr = LinearRegression()

Then we will use the fit method to “fit” the model to our dataset. What this does is nothing but make the regressor “study” our data and “learn” from it.

lr.fit(x_train, y_train)

Now that we have created our model and trained it, it is time we test the model with our testing dataset.

y_pred = lr.predict(x_test)

And voila! You have successfully created a robust, working linear regression model. Pat yourself on the back and revel in your success!

Visualization

Wait, wait.

Do not start partying just yet, for we still have to visualize our data and create some charts.

First, we make use of a scatter plot to plot the actual observations, with x_train on the x-axis and y_train on the y-axis.

For the regression line, we will use x_train on the x-axis and then the predictions of the x_train observations on the y-axis.

We add a touch of aesthetics by coloring the original observations in red and the regression line in green.

plt.scatter(x_train, y_train, color = "red")
plt.plot(x_train, lr.predict(x_train), color = "green")
plt.title("Salary vs Experience (Training set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()

Not so difficult, eh?

We repeat the same task for our testing dataset, and we get the following code-

plt.scatter(x_test, y_test, color = "red")
plt.plot(x_train, lr.predict(x_train), color = "green")
plt.title("Salary vs Experience (Testing set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()

An important point to note is that we are NOT going to change ‘x_train’ to ‘x_test’ in the second line of the code snippet. You see, our regressor model was trained by the training set and we got a unique equation from it, which we are going to use here too.

Hurray! Believe it or not, you built a regressor, trained it, made a prediction using test values and created a pretty cool visualization of the results!

I hope this helped you learn something new today and if you get stuck somewhere in between and couldn’t get to the end, do not get disheartened for, failures are the stepping stones to success.

Good luck, fellas!

Source: Artificial Intelligence on Medium

(Visited 5 times, 1 visits today)
Post a Comment

Newsletter