Support Vector Machines
There are many types of classification models in machine learning including Logistic Regression, K Nearest Neighbors, Random Forests, and Support Vector Machines. In my data science journey, I have learned a lot about the first three and wanted to take this opportunity to educate myself on the last one, Support Vector Machines.
How do they work?
The simple idea behind them is that they work to find an ideal line that best divides the data by class. This way the line becomes the threshold for class predictions. To find this line, a calculation is used to identify the maximum distance between specific points of different classes. These points are thought to be close to the boundaries that separate each of the groups. Lines are drawn down these boundaries where the points lie, and they are known as support vectors. Additionally, another line is drawn down the perfect center between the support vectors. This is the optimal division line, and is formally known as the maximum margin classifier or maximum margin hyperplane depending on dimensionality.
What if a line does not best divide the data?
The beauty of Support Vector Machines is their flexibility. They can easily handle linear separations, but they can also be adapted for nonlinear ones if need be. How this works is that by adding another dimension or dimensions, the boundaries become more clear. This is known as the kernel trick. See the visualizations below for reference:
There are many kernels for specific nonlinear types including: polynomial, gaussian, and ANOVA radial basis kernels to name a few. Each have unique examples of use. For example polynomial kernels are great for image processing, gaussian kernels can be useful for when we do not know our dataset very well, and ANOVA radial basis kernels can be used in regression as well as classification. Down below I will show you how to implement a SVM with scikit-learn. It is important to note that you can specify the kernel you want as a parameter.
What are the advantages and disadvantages of SVMs?
SVMs are highly effective in high dimensional space, they are memory efficient, and also versatile with their use of kernels to adapt to data of different shapes. On the other hand, even though there is a threshold for class consideration, they do not assign probabilities directly. It is computationally expensive to get them. Additionally, they do not seem to perform as well if the target classes are overlapping in the space. The example that you saw above was a more ideal example, but in nature they don’t always look like that.
How do I do this in Python?
#importing data
import seaborn as sns
iris = sns.load_dataset('iris')#importing train_test_split
from sklearn.model_selection import train_test_split#separating the target and independent var for split
X = iris.drop('species',axis=1)
y = iris['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2, test_size=0.2)#importing standard scaler and scaling the data
#THIS STEP IS CRUCIAL!
scaler = StandardScaler()
scaler.fit(X_train)
X_train_sc = scaler.transform(X_train)
X_test_sc = scaler.transform(X_test)#importing support vector machine, creating an instance, and fitting
from sklearn.svm import SVC
svc_model = SVC() #the default kernel is rbf
svc_model.fit(X_train,y_train)#getting the predictions on the test set
predictions = svc_model.predict(X_test_sc)
We can create SVMs with only a few lines of code. It is important to note that because the mathematics behind SVMs involves distance calculations, it is of utmost importance to scale your data! And remember that you should always scale after you split your data. SVMs are just another tool we can have under our belt in classification models. It is also highly encouraged to use GridSearch to test out variants of SVMs and let the machine do the heavy lifting to identify the model’s ideal parameters and optimize your metric. It is really that simple.
CONCLUSION:
SVMs are another machine learning technique for classification problems. They use support vectors to find the maximized marginal distance between classes. A SVM is a flexible technique that can be performed with linear and nonlinear data with the use of the kernel trick. It is fairly simple to execute this process with a small block of code.
References:
Documentation: https://scikitlearn.org/stable/modules/generated/sklearn.svm.SVC.html
SVM: https://www.kaggle.com/prashant111/svm-classifier-tutorial
*Excellent diagrams used from here
Kernels: https://data-flair.training/blogs/svm-kernel-functions/