This content originally appeared on Level Up Coding - Medium and was authored by Sarvesh Khetan
Support Vector Machine (SVM) — An Optimisation Mammoth
Table of Contents :
1. Motivation
2. Solution
2.a. Mathematical Modelling of 1st Condition
2.b. Mathematical Modelling of 2nd Condition
2.c. Combining the Optimization Problems
2.d. The Optimization Trick — which no book explains
2.e. Primal2Dual Optimization Conversion
2.f. Solving Dual Optimization Problem — SMO Algorithm
3. KeyNote
4. Code Implementation
Motivation
Now all the lines in the above diagram do 0 misclassifications i.e. least misclassifications, meaning all these lines are the best lines. But now how should we choose one out of all these lines?
In Linear Classification we unknowingly randomly choose any one of these but now we will devise a way to choose the best out of all these 0 misclassification lines !!!
Solution
Mathematical Modelling the 1st Condition
First we need to mathematically represent the condition that the line does least misclassifications…. this can be done via following optimisation problem
Mathematical Modelling the 2nd Condition
As visualized above the main solution to the problem is to find a middle line that does 0 misclassification out of all the possible 0 misclassification lines, we will try modelling this condition also as an optimisation problem
Combining the two optimization problem
Hence we need to find a way to solve this mammoth deadly-looking optimization equation !!!
THE OPTIMIZATION TRICK — which no book explains
Researchers concluded that the above optimization problem is UNsolvable and hence they need to apply some tricks to solve it. Using some tricks they concluded that the above optimization equation is equivalent to solving the following optimization equation
Now we will see the proof for the above statement i.e. we will see how to convert the above optimization equation 1 to this optimization equation 2, which is rarely mentioned in any book… !!
Now we already know a fact from class 10th mathematics that if we multiply a line equation by a scalar, it does not change. For instance
2x + 4y + 5 = 0 || 5 * (2x + 4y + 5) = 0 || 17 * (2x + 4y + 5) = 0
All these equations are of the same line ….. !!! Hence in the above representation let’s multiply the line equation by 1 / 84 and see what happens
Now researchers thought of using this powerful strategy to change the above ‘mathematical formulation of 2nd condition’ as follows !!!!
Hence the optimization problem statement changes from
Solving the Optimization Equation by converting it from PRIMAL to its DUAL form
Above we have ended up setting an equivalence between the following two optimization problems
Now let’s see how to solve the optimization problem 2 !!
Now this is a constrained non-linear optimization problem and more specifically it is a convex quadratic programming problem (QPP)cause all constraints are linear and the objective function is a quadratic convex function. There are several methods to solve this kind of optimization problem but they are computationally inefficient hence we will calculate the DUAL of this optimization problem and try to solve the dual.
Please refer to my optimization theory posts to understand how to write dual of a constrained non-linear primal optimization problem. Hence the dual of above primal problem goes as follows ….
Solving Dual Optimization Problem
Now the above dual optimization problem is also a constrained non-linear optimization problem and more specifically it is a convex quadratic programming problem (QPP).
Here, a question should arise in your mind, i.e. primal was also a convex QPP and so is dual, then what was the benefit of converting primal to dual? Dual has one resource constraint while the primal had m resource constraints thus making pirmal problem computationally more difficult to solve compared to dual !!
There are many was to solve this kind of convex QPP based dual problem, one of the most famous algorithm to solve is using SMO algorithm. Using this algorithm you will find optimal values of lambdas….
Advantages over other Classification Algorithms
- Works very well in cases where we have a large number of features(columns) because here time complexity of the algorithm does not depend on the no of features that you have in your dataset
- Works well if we have less no of datapoints (rows) because we ultimately care only about the support vectors which are very small. Say you have only 1000 data points then only 10 out of 1000 data points turns out to be support vectors.
- Works very well if you have outliers in your dataset because such outliers will have lambda = 0 i.e. they would have been support vectors and hence would not affect the decision boundary.
Disadvantages over other Classification Algorithms
- SVM doesn’t provide a posterior probability which is required for many applications
- Inefficient on a large dataset
Code Implementation
I am still working on the custom implementation, you can find here a simple library-based implementation
Support Vector Machine (SVM) — An Optimisation Mammoth was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Sarvesh Khetan
Sarvesh Khetan | Sciencx (2024-06-25T16:15:40+00:00) Support Vector Machine (SVM) — An Optimisation Mammoth. Retrieved from https://www.scien.cx/2024/06/25/support-vector-machine-svm-an-optimisation-mammoth/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.