Blog: A complete guide to Machine Algorithm:- Linear Regression
A complete guide to Machine Algorithm:- Linear Regression
Linear Regression is a supervised technique. It is supervised because we have to make the predictions using this model.
Let’s say there are 5 variables in the data set, assume 4 of them are ‘x’ variables and 1 is the ‘y’ variable. Now the question is which variables are to be considered as ‘x’ and which one as ‘y’?
‘y’ variable is the variable on which we do the predictions and other than ‘y’ variable all other remaining variables are my ‘x’ variables.
There are some conditions for both ‘x’ and ‘y’ variables. If these variables satisfy these conditions, then only Linear Regression can be built on that data set.
conditions for Linear Regression :-
- ‘y’ should be always numeric.
- ‘y’ should be always continuous.
- ‘x’ has to be numeric.
- ‘x’ can be categorical or continuous.
These conditions are required to build the Linear Regression model.
Equation of Linear Regression :-
y = B0 +B1x1 + B2x2 + — — — — + Error term
Here , ‘y’ is the variable on which we will do the predictions, B0 , B1 ,etc are the coefficients and x1,x2 are my ‘x’ variables. using this formula and the values associated with it the model of Linear Regression can be built.
But, there are certain assumptions of linear regression that should be satisfied while creating the model.
Assumptions of Linear regression :-
- There should be linear relation between my ‘x’ and ‘y’ variables.
- Error terms should be normally distributed and there should not be a pattern among them.
- There should be minimal multicollinearity among the ‘x’ variables.
- Homoscedasticity:- Variance around the regression line should be same for all the predicted values.
Considering all the assumptions we have to build the model.
For building the model practically ,we have to sample the data first i.e. divide the data. We divide the data into two parts that can be 80% and 20% or 90% and 10% in the samples of 1 and 2’s.
In this case, let’s say, we divide the data in the contribution of 80% and 20%. We train that 80% of data or use 80% of data for building model. We do not use whole data for creating model and we do predictions on 20% of the data that we have kept reserved for predictions only.
Note :- We do Predictions on 20% of data only and not on train data which is 80% or we do not use full data for predictions also.
Okay, So after building the model a huge problem of accuracy arises.
The question is how to check the accuracy ?
The solution is we check the summary of the model and study the R2 ( R — Squared). The range of R-Squared is from 0–1. Higher the value of R- Squared better is the model.
But, there is one limitation of R-Squared. If we keep on increasing the ‘x’ variables without knowing their significance R-Squared gradually increases till one point then it starts on decreasing. This is then not a good option to check the accuracy of the model. So, the alternative option to check the accuracy of model we study ‘Adjusted R-Squared’.
The main advantage of adjusted R-Squared is that, it only increases when significant ‘x’ variables are added to the model. This also helps us to check which ‘x’ variables are significant in our data set which can be used to build the model. The range of adjusted R-Squared is also from 0–1.
Higher the value of Adjusted R-Squared better is the model.
This is how a Linear Regression model is build and studied.
Thanks for Reading!