SVM Unit 2
SVM Unit 2
What is an SVM?
Support vector machines are a set of supervised learning methods used for
classification, regression, and outliers detection. All of these are common tasks in
machine learning.
You can use them to detect cancerous cells based on millions of images or you can
use them to predict future driving routes with a well-fitted regression model.
There are specific types of SVMs you can use for particular machine learning
problems, like support vector regression (SVR) which is an extension of support
vector classification (SVC).
The main thing to keep in mind here is that these are just math equations tuned to
give you the most accurate answer possible as quickly as possible.
SVMs are different from other classification algorithms because of the way they
choose the decision boundary that maximizes the distance from the nearest data
points of all the classes. The decision boundary created by SVMs is called the
maximum margin classifier or the maximum margin hyper plane.
What makes the linear SVM algorithm better than some of the other algorithms,
like k-nearest neighbors, is that it chooses the best line to classify your data points.
It chooses the line that separates the data and is the furthest away from the closet
data points as possible.
A 2-D example helps to make sense of all the machine learning jargon. Basically
you have some data points on a grid. You're trying to separate these data points by
the category they should fit in, but you don't want to have any data in the wrong
category. That means you're trying to find the line between the two closest points
that keeps the other data points separated.
So the two closest data points give you the support vectors you'll use to find that
line. That line is called the decision boundary.
Types of SVMs
There are two different types of SVMs, each used for different things:
Simple SVM: Typically used for linear regression and classification problems.
Kernel SVM: Has more flexibility for non-linear data because you can add more
features to fit a hyperplane instead of a two-dimensional space.
Why SVMs are used in machine learning
SVMs are used in applications like handwriting recognition, intrusion detection,
face detection, email classification, gene classification, and in web pages. This is
one of the reasons we use SVMs in machine learning. It can handle both
classification and regression on linear and non-linear data.
Another reason we use SVMs is because they can find complex relationships
between your data without you needing to do a lot of transformations on your own.
It's a great option when you are working with smaller datasets that have tens to
hundreds of thousands of features. They typically find more accurate results when
compared to other algorithms because of their ability to handle small, complex
datasets.
Here are some of the pros and cons for using SVMs.
Pros
Effective on datasets with multiple features, like financial or medical data.
Effective in cases where number of features is greater than the number of data
points.
Uses a subset of training points in the decision function called support vectors
which makes it memory efficient.
Different kernel functions can be specified for the decision function. You can use
common kernels, but it's also possible to specify custom kernels.
Cons
If the number of features is a lot bigger than the number of data points, avoiding
over-fitting when choosing kernel functions and regularization term is crucial.
SVMs don't directly provide probability estimates. Those are calculated using an
expensive five-fold cross-validation.
Works best on small sample sets because of its high training time.
Since SVMs can use any number of kernels, it's important that you know about a
few of them.
Kernel functions
Linear
These are commonly recommended for text classification because most of these
types of classification problems are linearly separable.
The linear kernel works really well when there are a lot of features, and text
classification problems have a lot of features. Linear kernel functions are faster
than most of the others and you have fewer parameters to optimize.
f(X) = w^T * X + b
In this equation, w is the weight vector that you want to minimize, X is the data
that you're trying to classify, and b is the linear coefficient estimated from the
training data. This equation defines the decision boundary that the SVM returns.
Polynomial
The polynomial kernel isn't used in practice very often because it isn't as
computationally efficient as other kernels and its predictions aren't as accurate.
In this function, alpha is a weight vector and C is an offset value to account for
some mis-classification of data that can happen.
Others
There are plenty of other kernels you can use for your project. This might be a
decision to make when you need to meet certain error constraints, you want to try
and speed up the training time, or you want to super tune parameters.
Some other kernels include: ANOVA radial basis, hyperbolic tangent, and Laplace
RBF.
Now that you know a bit about how the kernels work under the hood, let's go
through a couple of examples.
# linear data
X = np.array([1, 5, 1.5, 8, 1, 9, 7, 8.7, 2.3, 5.5, 7.7, 6.1])
y = np.array([2, 8, 1.8, 8, 0.6, 11, 10, 9.4, 4, 3, 8.8, 7.5])
The reason we're working with numpy arrays is to make the matrix operations
faster because they use less memory than Python lists. You could also take
advantage of typing the contents of the arrays. Now let's take a look at what the
data look like in a plot:
# show unclassified data
plt.scatter(X, y)
plt.show()
Once you see what the data look like, you can take a better guess at which
algorithm will work best for you. Keep in mind that this is a really simple dataset,
so most of the time you'll need to do some work on your data to get it to a usable
state.
We'll do a bit of pre-processing on the already structured code. This will put the
raw data into a format that we can use to train the SVM model.
With your model trained, you can make predictions on how a new data point will
be classified and you can make a plot of the decision boundary. Let's plot the
decision boundary.
# get the weight values for the linear equation from the trained SVM model
w = clf.coef_[0]
# non-linear data
circle_X, circle_y = datasets.make_circles(n_samples=300, noise=0.05)
The next step is to take a look at what this raw data looks like with a plot.
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# shape data
xy = np.vstack([X.ravel(), Y.ravel()]).T
When you have your data and you know the problem you're trying to solve, it
really can be this simple.
You can change your training model completely, you can choose different
algorithms and features to work with, and you can fine tune your results based on
multiple parameters. There are libraries and packages for all of this now so there's
not a lot of math you have to deal with.
There are a few things you should watch out for with SVMs in particular:
Make sure that your data are in numeric form instead of categorical form. SVMs
expect numbers instead of other kinds of labels.
Avoid copying data as much as possible. Some Python libraries will make
duplicates of your data if they aren't in a specific format. Copying data will also
slow down your training time and skew the way your model assigns the weights to
a specific feature.
Watch your kernel cache size because it uses your RAM. If you have a really large
dataset, this could cause problems for your system.
Scale your data because SVM algorithms aren't scale invariant. That means you
can convert all of your data to be within the ranges of [0, 1] or [-1, 1].