Blog: All you need to know about Support Vector Machines
SVM is a simple classification algorithm every machine learning practitioner should have in their toolbox. Let’s first understand how it works then see its pros and cons.
As said earlier it is a classification algorithm used in supervised learning when we have categorical data. SVM takes the labeled data points (features) as input and returns the hyperplane which classifies that data points into categories(classes) as we expect.
Understanding hyperplanes is mandatory to understand SVM. To be simple hyperplane is a decision boundary helps to classify the data points. Data points fall under different side of the hyperplane considered as separate classes.
As the number of features increases the dimensions of hyperplane also will increase. When there is N features the resultant hyperplane will have n-1 dimension.
After SVM finds the hyperplane it will try to maximize the margin. Here margin in the sense, the distance to nearest data points (also known as support vectors) from the hyperplane. Orientation and position of the hyperplane are highly depended in this nearest data points.
You might have a question, “Why need to maximize the margin?”, this maximized margin gives some reinforcement which helps the model to classify future data points with more confidence.
How SVM works when there are Outliers in the dataset?
As we saw earlier, the position and orientation of the hyperplane are highly influenced by the closest data points. So if outliers exist in the data set, the algorithm will try to find the best hyperplane which reasonably separates the classes with relative to the number of nearest data points (or support vectors).
SVM on Non-linear Datapoints
So far we have seen the examples on linear data points. Look at the below example can you imagine how the hyperplane will look like? Do you think it is possible to classify these data points using SVM?
Yes, it is possible with the help of the Kernel function. To be simple kernels are mathematical functions passed to the SVM as a parameter. Kernels take the input data points and convert them into the required form for the SVM, to find the hyperplane.
Kernel functions convert the non-linear data to linear data in higher order then finds the hyperplane. Again using the same kernel function it will map the decision boundary(hyperplane) on the non-linear data.
Let’s see how it works on our example,
Parameters of SVM
If you google “sklearn SVM” you will find the documentation for Scikit learn SVM model. There you can see the details of parameters passed into the SVM. Let me introduce three main parameters to be considered.
- Kernels: As we saw before kernel function is the parameter which needs to be passed according to our data points’ linearity. Refer to the sklearn documentation for different kernel functions available and, their uses.
- C (Regularization): Controls the trade-off between the smoothness of the decision boundary and correctness of the classification.i.e. when c is high it will classify all the data points correctly, also there is a chance to overfit.
- Gamma: Defines how much influence each training sample has on the decision boundary. i.e. when gamma is higher nearby points will have high influence and low gamma means far away points also be considered to get the decision boundary
play around and tune the hyperparameters and see the different results you get when trying out SVM classification.
- Regularization parameter helps to avoid overfitting.
- The kernel tricks help to classify both linear and non-linear data.
- SVM uses convex optimization, which ensures the result is a global minimum.
- SVM also supports semi-supervised learning.
Let’s see the disadvantages
- In general, SVM is slower on training and prediction.
This is the only disadvantage I found, if you know any other please respond (comment) to this story.
Hope now you have a clear picture of SVM algorithm. Applaud and Share with your friends!!!