At the same time, SVM and NB algorithms obtained an accuracy well above 90% using parameter tuning when required. SVM is more attractive theoretically.īoth NB and SVM allow the choice of kernel function for each and are sensitive to parameter optimization.Ĭomparing the accuracy of SVM and NB in spam classification showed that the basic NB algorithm gave the best prediction results (97.8%). It requires hyperparameter tuning which is not trivial and takes time. The weight vector is obtained by minimizing the sum of the two. For example in regularized least squares you have the loss function i y i w, x i b 2 2 and the regularizer w 2 2. SVM takes a long time while train large datasets. One reason is that it does not correspond to a normalizable likelihood. The training cost of SVM for large datasets is a handicap. SVM is a binary model in its conception, although it could be applied to classifying multiple classes with very good results. It works well when classes are well separated. It is effective with more dimensions than samples. SVM generalizes well in high dimensional spaces like those corresponding to texts. SVM is more powerful to address non-linear classification tasks. For some datasets, NB may defeat other classifiers using feature selection. Even though, NB gives good results when applied to short texts like tweets. NB assumes that features are independent between them, but this assumption does not always hold. NB supports binary classification as well as multinomial one. Therefore, it does not require an iterative process. It depends on conditional probabilities, which are easy to implement and evaluate. import numpy as np import matplotlib.pyplot as plt from sklearn import svm from sklearn.datasets import makeblobs we create 40 separable points X, y. Plot the maximum margin separating hyperplane within a two-class separable dataset using a Support Vector Machine classifier with linear kernel. The kernel trick consists of using specific kernel functions, which simplify the mapping between the original space into a higher-dimensional space. SVM: Maximum margin separating hyperplane. SVMs can efficiently perform a non-linear classification using the so-called kernel trick. Therefore, the original space is mapped into a higher-dimensional space where the separation could be obtained. Those points over the dashed line are called the support vectors.įrequently happens that the sets are not linearly separable in the original space. The red line indicates the maximum margin hyperplane that separates both groups of points. We can see a set of points corresponding to two categories, blue and green. Thus, it is called the maximum margin hyperplane: Thus, the best hyperplane is the one that gives the largest margin, or distance, between the two categories. Many hyperplanes could satisfy this condition. It maps the data points in space to maximize the distance between the two categories.įor SVM, data points are N-dimensional vectors, and the method looks for an N-1 dimensional hyperplane to separate them. SVM applies a geometric interpretation of the data. Support Vector Machine (SVM) is a very popular model.