ML Unit-2

UNIT-2
KNN algorithm :
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
o Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry.
It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
o As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points
for all the training samples.
Example :
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:
Example: Suppose there is a marketing company A, who does various advertisement every year
and get sales on that. The below list shows the advertisement made by the company in the last 5
years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
Some examples of regression can be as:
o Prediction of rain using temperature and other factors

o Determining Market trends
o Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are used
to predict the values of the dependent variables are called independent variable, also
called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in
the dataset, because it creates problem while ranking the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but not
well with test dataset, then such problem is called Overfitting. And if our algorithm does
not perform well even with training dataset, then such problem is called underfitting.
Why do we use Regression Analysis?
As mentioned above, Regression analysis helps in the prediction of a continuous variable. There
are various scenarios in the real world where we need some future predictions such as weather
condition, sales prediction, marketing trends, etc., for such case we need some technology which
can make predictions more accurately. So for such case we need Regression analysis which is a
statistical method and used in machine learning and data science. Below are some other reasons
for using Regression analysis:
o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine learning. Each
type has its own importance on different scenarios, but at the core, all the regression methods
analyze the effect of the independent variable on dependent variables. Here we are discussing
some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained using
the below image. Here we are predicting the salary of an employee on the basis of the
year of experience.
o Below is the mathematical equation for Linear regression:
1. Y= aX+b
Here,Y=dependentvariables(targetvariables),
X=Independentvariables(predictorvariables),
a and b are the linear coefficients
Some popular applications of linear regression are:
o Analyzing trends and sales estimates

o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.
o Linear regression : (example)
o import matplotlib.pyplot as plt
from scipy import stats
x=[5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope,intercept,r, p,std_err=stats.linregress(x,y)
def myfunc(x):
return slope*x+intercept
mymodel= list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:
o f(x)= Output between the 0 and 1 value.
o x= input to the function
o e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are rounded up to
1, and values below the threshold level are rounded up to 0.
There are three types of logistic regression:
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear dataset using
a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value
of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-linear
fashion, so for such case, linear regression will not best fit to those datapoints. To cover
such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial features
of given degree and then modeled using a linear model. Which means the datapoints are
best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression equation that
means Linear regression equation Y= b 0+ b1x, is transformed into Polynomial regression
equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic
Support Vector Regression:
Support Vector Machine is a supervised learning algorithm which can be used for regression as
well as classification problems. So if we use it for regression problems, then it is termed as
Support Vector Regression.
Support Vector Regression is a regression algorithm which works for continuous variables.
Below are some keywords which are used in Support Vector Regression:
o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is
a line which helps to predict the continuous variables and cover most of the datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a
margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the hyperplane
and opposite class.
In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum
number of datapoints are covered in that margin. The main goal of SVR is to consider the
maximum datapoints within the boundary lines and the hyperplane (best-fit line) must contain a
maximum number of datapoints. Consider the below image:
Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.
Decision Tree Regression:
o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and each
leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.
Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is trying to
predict the choice of a person between Sports cars or Luxury car.
o Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines multiple
decision trees and predicts the final output based on the average of each tree output. The
combined decision trees are called as base models, and it can be represented more
formally as:
g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble learning
in which aggregated decision tree runs in parallel and do not interact with each other.
o With the help of Random Forest regression, we can prevent Overfitting in the model by
creating random subsets of the dataset.
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in which a small
amount of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty. We can
compute this penalty term by multiplying with the lambda to the squared weight of each
individual features.
o The equation for ridge regression will be:
o A general linear or polynomial regression will fail if there is high collinearity between the
independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity of
the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
o Lasso regression is another regularization technique to reduce the complexity of the

model.
o It is similar to the Ridge Regression except that penalty term contains only the absolute
weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:
Naïve bayes Example
# load the iris dataset

from sklearn.datasets import load_iris
iris = load_iris()
# store the feature matrix (X) and response vector (y)

X = iris.data
y = iris.target
# splitting X and y into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
# training the model on training set

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# making predictions on the testing set

y_pred = gnb.predict(X_test)
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,
y_pred)*100)
Decision trees :
import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
df = pandas.read_csv("data.csv")
d = {'UK': 0, 'USA': 1, 'N': 2}

df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
tree.plot_tree(dtree, feature_names=features)
Ensembles of Decision trees :
Ensembles are a type of machine learning technique that combines multiple models to create a
more powerful, accurate, and robust model. They are used in machine learning because they can
produce more accurate and reliable results than a single model. Ensembles can also help reduce
the risk of overfitting, which is when a model is too closely tailored to the training data and
doesn’t generalize well to new data. Ensemble methods also have the advantage of being able to
make use of the strengths of different models and to reduce the weaknesses of individual models
Types of ensembles: bagging, boosting
Bagging
 When we want to decrease the variance of a decision tree, we employ bagging (Bootstrap
Aggregation). The objective here is to generate different subsets of data from a training
sample selected at random using replacement.
 Each subset of data is now utilised to train their decision trees. As a consequence, we
have an ensemble of many models. It is more resilient than a single decision tree to utilise
the average of all forecasts from several trees.
 Random Forest is a bagging extension. It adds an extra step in which, in addition to
utilising a random subset of data, it also uses a random selection of features to create
trees rather than using all features. When you have a lot of random trees. It's known
as Random Forest.
Boosting
 Boosting is another ensemble strategy for generating a set of predictors. Learners are
taught progressively in this method, with early learners fitting basic models to data and
subsequently examining data for flaws. In other words, we fit subsequent trees (random
sample) with the purpose of solving for net error from the previous tree at each step.
 When a hypothesis incorrectly classifies an input, its weight is increased such that the
following hypothesis is more likely to properly identify it. By mixing the entire collection
at the end, weak learners are transformed into better performing models.
 Gradient Boosting is an enhancement to the boosting approach.
Gradient Boosting =Gradient Descent+Boosting.
It employs the gradient descent approach to optimise any differentiable loss function. Individual
trees are totaled consecutively to form an ensemble of trees. The following tree attempts to
recover the loss (difference between actual and predicted values).
Stacking
 Stacking or Stacked Generalization is an ensemble technique. It uses a meta-learning

algorithm to learn how to best combine the predictions from two or more base machine
learning algorithms.
 The benefit of stacking is that it can harness the capabilities of a range of well-
performing models on a classification or regression task and make predictions that have
better performance than any single model in the ensemble.
 Given multiple machine learning models that are skillful on a problem, but in different
ways, how do you choose which model to use (trust)? The approach to this question is to
use another machine learning model that learns when to use or trust each model in the
ensemble.
 Unlike bagging, in stacking, the models are typically different (e.g. not all decision trees)
and fit on the same dataset (e.g. instead of samples of the training dataset).
 Unlike boosting, in stacking, a single model is used to learn how to best combine the
predictions from the contributing models (e.g. instead of a sequence of models that
correct the predictions of prior models).
 The architecture of a stacking model involves two or more base models, often referred to
as level-0 models, and a meta-model that combines the predictions of the base models,
referred to as a level-1 model.
 The outputs from the base models used as input to the meta-model may be real value in
the case of regression, and probability like values, or class labels in the case of
classification.
Applications of ensemble models
1. Classification: Ensemble models are commonly used for classification tasks, such as
classifying emails as spam or legitimate, identifying customer segments, or predicting the
outcome of a medical diagnosis. Ensemble models can combine multiple decision tree,
neural network, support vector machine, and gradient boosting machine models to create
a more powerful and accurate classification model.
2. Regression: Ensemble models are also useful for regression tasks, such as predicting
housing prices, stock prices, or customer churn rates. By combining multiple decision
tree, neural network, support vector machine, and gradient boosting machine models,
ensemble models can create a more powerful and accurate model to predict values.
3. Anomaly Detection: Ensemble models are also useful for anomaly detection tasks, such
as detecting fraudulent credit card transactions or identifying suspicious activity on social
media. Ensemble models can combine multiple decision tree, neural network, support
vector machine, and gradient boosting machine models to create a more powerful and
accurate anomaly detection model.
What is Kernel Method?

A set of techniques known as kernel methods are used in machine learning to address
classification, regression, and other prediction issues. They are built around the idea of
kernels, which are functions that gauge how similar two data points are to one another in a
high-dimensional feature space.
Kernel methods' fundamental premise is used to convert the input data into a high-
dimensional feature space, which makes it simpler to distinguish between classes or
generate predictions. Kernel methods employ a kernel function to implicitly map the data
into the feature space, as opposed to manually computing the feature space.
The most popular kind of kernel approach is the Support Vector Machine (SVM), a binary
classifier that determines the best hyperplane that most effectively divides the two groups.
In order to efficiently locate the ideal hyperplane, SVMs map the input into a higher-
dimensional space using a kernel function.
Other examples of kernel methods include kernel ridge regression, kernel PCA, and Gaussian
processes. Since they are strong, adaptable, and computationally efficient, kernel
approaches are frequently employed in machine learning. They are resilient to noise and
outliers and can handle sophisticated data structures like strings and graphs.
Kernel Method in SVMs

Support Vector Machines (SVMs) use kernel methods to transform the input data into a
higher-dimensional feature space, which makes it simpler to distinguish between classes or
generate predictions. Kernel approaches in SVMs work on the fundamental principle of
implicitly mapping input data into a higher-dimensional feature space without directly
computing the coordinates of the data points in that space.
The kernel function in SVMs is essential in determining the decision boundary that divides
the various classes. In order to calculate the degree of similarity between any two points in
the feature space, the kernel function computes their dot product.
The most commonly used kernel function in SVMs is the Gaussian or radial basis function
(RBF) kernel. The RBF kernel maps the input data into an infinite-dimensional feature space
using a Gaussian function. This kernel function is popular because it can capture complex
nonlinear relationships in the data.
Other types of kernel functions that can be used in SVMs include the polynomial kernel, the
sigmoid kernel, and the Laplacian kernel. The choice of kernel function depends on the
specific problem and the characteristics of the data.
Basically, kernel methods in SVMs are a powerful technique for solving classification and
regression problems, and they are widely used in machine learning because they can handle
complex data structures and are robust to noise and outliers.
Characteristics of Kernel Function

Kernel functions used in machine learning, including in SVMs (Support Vector Machines),
have several important characteristics, including:
o Mercer's condition: A kernel function must satisfy Mercer's condition to be valid.

This condition ensures that the kernel function is positive semi definite, which means
that it is always greater than or equal to zero.
o Positive definiteness: A kernel function is positive definite if it is always greater
than zero except for when the inputs are equal to each other.
o Non-negativity: A kernel function is non-negative, meaning that it produces non-
negative values for all inputs.
o Symmetry: A kernel function is symmetric, meaning that it produces the same value
regardless of the order in which the inputs are given.
o Reproducing property: A kernel function satisfies the reproducing property if it can
be used to reconstruct the input data in the feature space.
o Smoothness: A kernel function is said to be smooth if it produces a smooth
transformation of the input data into the feature space.
o Complexity: The complexity of a kernel function is an important consideration, as
more complex kernel functions may lead to over fitting and reduced generalization
performance.
Basically, the choice of kernel function depends on the specific problem and the
characteristics of the data, and selecting an appropriate kernel function can significantly
impact the performance of machine learning algorithms.
Major Kernel Function in Support Vector Machine

In Support Vector Machines (SVMs), there are several types of kernel functions that can be
used to map the input data into a higher-dimensional feature space. The choice of kernel
function depends on the specific problem and the characteristics of the data.
Here are some most commonly used kernel functions in SVMs:
Linear Kernel
A linear kernel is a type of kernel function used in machine learning, including in SVMs
(Support Vector Machines). It is the simplest and most commonly used kernel function, and
it defines the dot product between the input vectors in the original feature space.
The linear kernel can be defined as:
1. K(x, y) = x .y
Where x and y are the input feature vectors. The dot product of the input vectors is a
measure of their similarity or distance in the original feature space.
When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that
separates the different classes in the feature space. This linear boundary can be useful when
the data is already separable by a linear decision boundary or when dealing with high-
dimensional data, where the use of more complex kernel functions may lead to overfitting.
Polynomial Kernel
A particular kind of kernel function utilised in machine learning, such as in SVMs, is a

polynomial kernel (Support Vector Machines). It is a nonlinear kernel function that employs
polynomial functions to transfer the input data into a higher-dimensional feature space.
One definition of the polynomial kernel is:
Where x and y are the input feature vectors, c is a constant term, and d is the degree of the
polynomial, K(x, y) = (x. y + c)d. The constant term is added to, and the dot product of the
input vectors elevated to the degree of the polynomial.
The decision boundary of an SVM with a polynomial kernel might capture more intricate
correlations between the input characteristics because it is a nonlinear hyperplane.
The degree of nonlinearity in the decision boundary is determined by the degree of the
polynomial.
The polynomial kernel has the benefit of being able to detect both linear and nonlinear
correlations in the data. It can be difficult to select the proper degree of the polynomial,
though, as a larger degree can result in overfitting while a lower degree cannot adequately
represent the underlying relationships in the data.
In general, the polynomial kernel is an effective tool for converting the input data into a
higher-dimensional feature space in order to capture nonlinear correlations between the
input characteristics.
Gaussian (RBF) Kernel
The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular kernel
function used in machine learning, particularly in SVMs (Support Vector Machines). It is a
nonlinear kernel function that maps the input data into a higher-dimensional feature space
using a Gaussian function.
The Gaussian kernel can be defined as:
1. K(x, y) = exp(-gamma * ||x - y||^2)
Where x and y are the input feature vectors, gamma is a parameter that controls the width
of the Gaussian function, and ||x - y||^2 is the squared Euclidean distance between the input
vectors.
When using a Gaussian kernel in an SVM, the decision boundary is a nonlinear hyper plane
that can capture complex nonlinear relationships between the input features. The width of
the Gaussian function, controlled by the gamma parameter, determines the degree of
nonlinearity in the decision boundary.
One advantage of the Gaussian kernel is its ability to capture complex relationships in the
data without the need for explicit feature engineering. However, the choice of the gamma
parameter can be challenging, as a smaller value may result in under fitting, while a larger
value may result in over fitting.
Laplace Kernel
The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is a type
of kernel function used in machine learning, including in SVMs (Support Vector Machines). It
is a non-parametric kernel that can be used to measure the similarity or distance between
two input feature vectors.
The Laplacian kernel can be defined as:
1. K(x, y) = exp(-gamma * ||x - y||)
Where x and y are the input feature vectors, gamma is a parameter that controls the width
of the Laplacian function, and ||x - y|| is the L1 norm or Manhattan distance between the
input vectors.
When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear hyperplane
that can capture complex relationships between the input features. The width of the
Laplacian function, controlled by the gamma parameter, determines the degree of
nonlinearity in the decision boundary.
One advantage of the Laplacian kernel is its robustness to outliers, as it places less weight
on large distances between the input vectors than the Gaussian kernel. However, like the
Gaussian kernel, choosing the correct value of the gamma parameter can be challenging.
Uncertainty Estimates from Classifiers:

https://cs.adelaide.edu.au/~javen/talk/ML05_Uncertainty_in_DL.pdf

ML Unit-2

Uploaded by

Copyright:

Available Formats

ML Unit-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit-2

Uploaded by

Copyright:

Available Formats

UNIT-2

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Why do we need a K-NN Algorithm?

How does K-NN work?

o Step-1: Select the number K of the neighbors

How to select the value of K in the K-NN Algorithm?

Advantages of KNN Algorithm:

Disadvantages of KNN Algorithm:

Regression Analysis in Machine learning

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

Terminologies Related to the Regression Analysis:

Why do we use Regression Analysis?

o Below is the mathematical equation for Linear regression:

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

There are three types of logistic regression:

Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

Decision Tree Regression:

g(x)= f0(x)+ f1(x)+ f2(x)+....

o Lasso regression is another regularization technique to reduce the complexity of the

Naïve bayes Example

# load the iris dataset

# store the feature matrix (X) and response vector (y)

# splitting X and y into training and testing sets

# training the model on training set

# making predictions on the testing set

d = {'UK': 0, 'USA': 1, 'N': 2}

features = ['Age', 'Experience', 'Rank', 'Nationality']

Ensembles of Decision trees :

Types of ensembles: bagging, boosting

Gradient Boosting =Gradient Descent+Boosting.

 Stacking or Stacked Generalization is an ensemble technique. It uses a meta-learning

What is Kernel Method?

Kernel Method in SVMs

Characteristics of Kernel Function

o Mercer's condition: A kernel function must satisfy Mercer's condition to be valid.

Major Kernel Function in Support Vector Machine

Here are some most commonly used kernel functions in SVMs:

The linear kernel can be defined as:

A particular kind of kernel function utilised in machine learning, such as in SVMs, is a

One definition of the polynomial kernel is:

The Gaussian kernel can be defined as:

1. K(x, y) = exp(-gamma * ||x - y||^2)

The Laplacian kernel can be defined as:

1. K(x, y) = exp(-gamma * ||x - y||)

Uncertainty Estimates from Classifiers:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.