ML Unit-2
ML Unit-2
ML Unit-2
KNN algorithm :
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:
The K-NN working can be explained on the basis of the below algorithm:
Below are some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points
for all the training samples.
Example :
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
Regression analysis is a statistical method to model the relationship between a dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:
Example: Suppose there is a marketing company A, who does various advertisement every year
and get sales on that. The below list shows the advertisement made by the company in the last 5
years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
As mentioned above, Regression analysis helps in the prediction of a continuous variable. There
are various scenarios in the real world where we need some future predictions such as weather
condition, sales prediction, marketing trends, etc., for such case we need some technology which
can make predictions more accurately. So for such case we need Regression analysis which is a
statistical method and used in machine learning and data science. Below are some other reasons
for using Regression analysis:
o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine learning. Each
type has its own importance on different scenarios, but at the core, all the regression methods
analyze the effect of the independent variable on dependent variables. Here we are discussing
some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained using
the below image. Here we are predicting the salary of an employee on the basis of the
year of experience.
1. Y= aX+b
Here,Y=dependentvariables(targetvariables),
X=Independentvariables(predictorvariables),
a and b are the linear coefficients
x=[5,7,8,7,2,17,2,9,4,11,12,9,6]
y= [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope,intercept,r, p,std_err=stats.linregress(x,y)
def myfunc(x):
return slope*x+intercept
mymodel= list(map(myfunc,x))
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()
Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:
o f(x)= Output between the 0 and 1 value.
o x= input to the function
o e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are rounded up to
1, and values below the threshold level are rounded up to 0.
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
o Polynomial Regression is a type of regression which models the non-linear dataset using
a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the value
of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-linear
fashion, so for such case, linear regression will not best fit to those datapoints. To cover
such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial features
of given degree and then modeled using a linear model. Which means the datapoints are
best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression equation that
means Linear regression equation Y= b 0+ b1x, is transformed into Polynomial regression
equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic
Support Vector Machine is a supervised learning algorithm which can be used for regression as
well as classification problems. So if we use it for regression problems, then it is termed as
Support Vector Regression.
Support Vector Regression is a regression algorithm which works for continuous variables.
Below are some keywords which are used in Support Vector Regression:
In SVR, we always try to determine a hyperplane with a maximum margin, so that maximum
number of datapoints are covered in that margin. The main goal of SVR is to consider the
maximum datapoints within the boundary lines and the hyperplane (best-fit line) must contain a
maximum number of datapoints. Consider the below image:
Here, the blue line is called hyperplane, and the other two lines are known as boundary lines.
o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and each
leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.
Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is trying to
predict the choice of a person between Sports cars or Luxury car.
o Random forest is one of the most powerful supervised learning algorithms which is
capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines multiple
decision trees and predicts the final output based on the average of each tree output. The
combined decision trees are called as base models, and it can be represented more
formally as:
o Ridge regression is one of the most robust versions of linear regression in which a small
amount of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty. We can
compute this penalty term by multiplying with the lambda to the squared weight of each
individual features.
o The equation for ridge regression will be:
o A general linear or polynomial regression will fail if there is high collinearity between the
independent variables, so to solve such problems, Ridge regression can be used.
o Ridge regression is a regularization technique, which is used to reduce the complexity of
the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression:
# comparing actual response values (y_test) with predicted response values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,
y_pred)*100)
Decision trees :
import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
df = pandas.read_csv("data.csv")
X = df[features]
y = df['Go']
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
tree.plot_tree(dtree, feature_names=features)
Ensembles are a type of machine learning technique that combines multiple models to create a
more powerful, accurate, and robust model. They are used in machine learning because they can
produce more accurate and reliable results than a single model. Ensembles can also help reduce
the risk of overfitting, which is when a model is too closely tailored to the training data and
doesn’t generalize well to new data. Ensemble methods also have the advantage of being able to
make use of the strengths of different models and to reduce the weaknesses of individual models
Bagging
When we want to decrease the variance of a decision tree, we employ bagging (Bootstrap
Aggregation). The objective here is to generate different subsets of data from a training
sample selected at random using replacement.
Each subset of data is now utilised to train their decision trees. As a consequence, we
have an ensemble of many models. It is more resilient than a single decision tree to utilise
the average of all forecasts from several trees.
Random Forest is a bagging extension. It adds an extra step in which, in addition to
utilising a random subset of data, it also uses a random selection of features to create
trees rather than using all features. When you have a lot of random trees. It's known
as Random Forest.
Boosting
Boosting is another ensemble strategy for generating a set of predictors. Learners are
taught progressively in this method, with early learners fitting basic models to data and
subsequently examining data for flaws. In other words, we fit subsequent trees (random
sample) with the purpose of solving for net error from the previous tree at each step.
When a hypothesis incorrectly classifies an input, its weight is increased such that the
following hypothesis is more likely to properly identify it. By mixing the entire collection
at the end, weak learners are transformed into better performing models.
Gradient Boosting is an enhancement to the boosting approach.
It employs the gradient descent approach to optimise any differentiable loss function. Individual
trees are totaled consecutively to form an ensemble of trees. The following tree attempts to
recover the loss (difference between actual and predicted values).
Stacking
1. Classification: Ensemble models are commonly used for classification tasks, such as
classifying emails as spam or legitimate, identifying customer segments, or predicting the
outcome of a medical diagnosis. Ensemble models can combine multiple decision tree,
neural network, support vector machine, and gradient boosting machine models to create
a more powerful and accurate classification model.
2. Regression: Ensemble models are also useful for regression tasks, such as predicting
housing prices, stock prices, or customer churn rates. By combining multiple decision
tree, neural network, support vector machine, and gradient boosting machine models,
ensemble models can create a more powerful and accurate model to predict values.
3. Anomaly Detection: Ensemble models are also useful for anomaly detection tasks, such
as detecting fraudulent credit card transactions or identifying suspicious activity on social
media. Ensemble models can combine multiple decision tree, neural network, support
vector machine, and gradient boosting machine models to create a more powerful and
accurate anomaly detection model.
Kernel methods' fundamental premise is used to convert the input data into a high-
dimensional feature space, which makes it simpler to distinguish between classes or
generate predictions. Kernel methods employ a kernel function to implicitly map the data
into the feature space, as opposed to manually computing the feature space.
The most popular kind of kernel approach is the Support Vector Machine (SVM), a binary
classifier that determines the best hyperplane that most effectively divides the two groups.
In order to efficiently locate the ideal hyperplane, SVMs map the input into a higher-
dimensional space using a kernel function.
Other examples of kernel methods include kernel ridge regression, kernel PCA, and Gaussian
processes. Since they are strong, adaptable, and computationally efficient, kernel
approaches are frequently employed in machine learning. They are resilient to noise and
outliers and can handle sophisticated data structures like strings and graphs.
The most commonly used kernel function in SVMs is the Gaussian or radial basis function
(RBF) kernel. The RBF kernel maps the input data into an infinite-dimensional feature space
using a Gaussian function. This kernel function is popular because it can capture complex
nonlinear relationships in the data.
Other types of kernel functions that can be used in SVMs include the polynomial kernel, the
sigmoid kernel, and the Laplacian kernel. The choice of kernel function depends on the
specific problem and the characteristics of the data.
Basically, kernel methods in SVMs are a powerful technique for solving classification and
regression problems, and they are widely used in machine learning because they can handle
complex data structures and are robust to noise and outliers.
Basically, the choice of kernel function depends on the specific problem and the
characteristics of the data, and selecting an appropriate kernel function can significantly
impact the performance of machine learning algorithms.
Linear Kernel
A linear kernel is a type of kernel function used in machine learning, including in SVMs
(Support Vector Machines). It is the simplest and most commonly used kernel function, and
it defines the dot product between the input vectors in the original feature space.
1. K(x, y) = x .y
Where x and y are the input feature vectors. The dot product of the input vectors is a
measure of their similarity or distance in the original feature space.
When using a linear kernel in an SVM, the decision boundary is a linear hyperplane that
separates the different classes in the feature space. This linear boundary can be useful when
the data is already separable by a linear decision boundary or when dealing with high-
dimensional data, where the use of more complex kernel functions may lead to overfitting.
Polynomial Kernel
Where x and y are the input feature vectors, c is a constant term, and d is the degree of the
polynomial, K(x, y) = (x. y + c)d. The constant term is added to, and the dot product of the
input vectors elevated to the degree of the polynomial.
The decision boundary of an SVM with a polynomial kernel might capture more intricate
correlations between the input characteristics because it is a nonlinear hyperplane.
The degree of nonlinearity in the decision boundary is determined by the degree of the
polynomial.
The polynomial kernel has the benefit of being able to detect both linear and nonlinear
correlations in the data. It can be difficult to select the proper degree of the polynomial,
though, as a larger degree can result in overfitting while a lower degree cannot adequately
represent the underlying relationships in the data.
In general, the polynomial kernel is an effective tool for converting the input data into a
higher-dimensional feature space in order to capture nonlinear correlations between the
input characteristics.
Gaussian (RBF) Kernel
The Gaussian kernel, also known as the radial basis function (RBF) kernel, is a popular kernel
function used in machine learning, particularly in SVMs (Support Vector Machines). It is a
nonlinear kernel function that maps the input data into a higher-dimensional feature space
using a Gaussian function.
Where x and y are the input feature vectors, gamma is a parameter that controls the width
of the Gaussian function, and ||x - y||^2 is the squared Euclidean distance between the input
vectors.
When using a Gaussian kernel in an SVM, the decision boundary is a nonlinear hyper plane
that can capture complex nonlinear relationships between the input features. The width of
the Gaussian function, controlled by the gamma parameter, determines the degree of
nonlinearity in the decision boundary.
One advantage of the Gaussian kernel is its ability to capture complex relationships in the
data without the need for explicit feature engineering. However, the choice of the gamma
parameter can be challenging, as a smaller value may result in under fitting, while a larger
value may result in over fitting.
Laplace Kernel
The Laplacian kernel, also known as the Laplace kernel or the exponential kernel, is a type
of kernel function used in machine learning, including in SVMs (Support Vector Machines). It
is a non-parametric kernel that can be used to measure the similarity or distance between
two input feature vectors.
Where x and y are the input feature vectors, gamma is a parameter that controls the width
of the Laplacian function, and ||x - y|| is the L1 norm or Manhattan distance between the
input vectors.
When using a Laplacian kernel in an SVM, the decision boundary is a nonlinear hyperplane
that can capture complex relationships between the input features. The width of the
Laplacian function, controlled by the gamma parameter, determines the degree of
nonlinearity in the decision boundary.
One advantage of the Laplacian kernel is its robustness to outliers, as it places less weight
on large distances between the input vectors than the Gaussian kernel. However, like the
Gaussian kernel, choosing the correct value of the gamma parameter can be challenging.