0% found this document useful (0 votes)
6 views20 pages

Machine Learning Unil-1

Machine learning is a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It encompasses various methods, including supervised, unsupervised, and reinforcement learning, each with distinct approaches to training models. The document also discusses the importance of cross-validation for model evaluation and the concept of dimensionality reduction to simplify complex datasets.

Uploaded by

aryanbajpai916
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

Machine Learning Unil-1

Machine learning is a subset of artificial intelligence that enables computers to learn from data and make predictions without explicit programming. It encompasses various methods, including supervised, unsupervised, and reinforcement learning, each with distinct approaches to training models. The document also discusses the importance of cross-validation for model evaluation and the concept of dimensionality reduction to simplify complex datasets.

Uploaded by

aryanbajpai916
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Machine learning

Unil-1
Introduction to machine learning:- Machine learning is a growing technology which
enables computers to learn automatically from past data. Machine learning uses various
algorithms for building mathematical models and making predictions using historical data
or information. Currently, it is being used for various tasks such as image recognition, speech
recognition, email filtering, Face book auto-tagging, recommender system, and many more.
In the real world, we are surrounded by humans who can learn everything from their experiences
with their learning capability, and we have computers or machines which work on our
instructions. But can a machine also learn from experiences or past data like a human does? So
here comes the role of Machine Learning. Machine Learning is said as a subset of artificial
intelligence that is mainly concerned with the development of algorithms which allow a
computer to learn from the data and past experiences on their own. The term machine learning
was first introduced by Arthur Samuel in 1959. Machine learning enables a machine to
automatically learn from data, improve performance from experiences, and predict things
without being explicitly programmed. With the help of sample historical data, which is known
as training data, machine learning algorithms build a mathematical model that helps in making
predictions or decisions without being explicitly programmed. Machine learning brings computer
science and statistics together for creating predictive models. Machine learning constructs or
uses the algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining more data.

Scope & limitations of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge amount of
the data

The need for machine learning is increasing day by day. The reason behind the need for machine learning
is that it is capable of doing tasks that are too complex for a person to implement directly. As a human,
we have some limitations as we cannot access the huge amount of data manually, so for this, we need
some computer systems and here comes the machine learning to make things easy for us.We can train
machine learning algorithms by providing them the huge amount of data and let them explore the data,
construct the models, and predict the required output automatically. The performance of the machine
learning algorithm depends on the amount of data, and it can be determined by the cost function. With the
help of machine learning, we can save both time and money. The importance of machine learning can be
easily understood by its uses cases, currently, machine learning is used in self-driving cars, cyber fraud
detection, face recognition, and friend suggestion by Face book, etc. Various top companies such as
Netflix and Amazon have build machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.

machine learning models:- machine learning can be classified into three types of
method:- Supervised learning:-Supervised learning is a type of machine learning method in
which we provide sample labeled data to the machine learning system in order to train it, and
on that basis, it predicts the output.The system creates a model using labeled data to
understand the datasets and learn about each data, once the training and processing are done
then we test the model by providing a sample data to check whether it is predicting the exact
output or not.The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student learns things
in the supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

Unsupervised learning :-is a learning method in which a machine learns without any
supervision.

1. The training is provided to the machine with the set of data that has not been
labeled, classified, or categorized, and the algorithm needs to act on that data
without any supervision. The goal of unsupervised learning is to restructure the
input data into new features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to


find useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

2. Clustering
3. Association
4.

Reinforcement learning:- Reinforcement learning is a feedback-based learning


method, in which a learning agent gets a reward for each right action and gets a penalty
for each wrong action. The agent learns automatically with these feedbacks and
improves its performance. In reinforcement learning, the agent interacts with the
environment and explores it. The goal of an agent is to get the most reward points, and
hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

hypothesis space and inductive bias:- he hypothesis is a common term in Machine


Learning and data science projects. As we know, machine learning is one of the most
powerful technologies across the world, which helps us to predict results based on past
experiences. Moreover, data scientists and ML professionals conduct experiments that
aim to solve a problem. These ML professionals and data scientists make an initial
assumption for the solution of the problem.

This assumption in Machine learning is known as Hypothesis. In Machine Learning, at


various times, Hypothesis and Model are used interchangeably. However, a Hypothesis
is an assumption made by scientists, whereas a model is a mathematical representation
that is used to test the hypothesis. In this topic, "Hypothesis in Machine Learning," we
will discuss a few important concepts related to a hypothesis in machine learning and
their importance. So, let's start with a quick introduction to Hypothesis.

HypothesisThe hypothesis is defined as the supposition or proposed explanation based


on insufficient evidence or assumptions. It is just a guess based on some known facts
but has not yet been proven. A good hypothesis is testable, which results in either true
or false.Example: Let's understand the hypothesis with a common example. Some
scientist claims that ultraviolet (UV) light can damage the eyes then it may also cause
blindness.The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It is
specifically used in Supervised Machine learning, where an ML model learns a function that best maps
the input to corresponding outputs with the help of an available dataset.
In supervised learning techniques, the main aim is to determine the possible hypothesis out of
hypothesis space that best maps input to the corresponding or correct outputs.

There are some common methods given to find out the possible hypothesis from the Hypothesis
space, where hypothesis space is represented by uppercase-h (H) and hypothesis
by lowercase-h (h). Th ese are defined as follows:

Hypothesis space (H):Hypothesis space is defined as a set of all possible legal


hypotheses; hence it is also known as a hypothesis set. It is used by supervised machine learning
algorithms to determine the best possible hypothesis to describe the target function or best maps
input to output.It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.

Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised machine
learning algorithms. It is primarily based on data as well as bias and restrictions applied to
data.Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper
output and can be evaluated as well as used to make predictions.The hypothesis (h) can be
formulated in machine learning as follows:y= mx + bWhere,Y: Rangem: Slope of the line which
divided test data or changes in y divided by change in x. x: domain c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:

Now, assume we have some test data by which ML algorithms predict the outputs for
input as follows:
If we divide this coordinate plane in such as way that it can help you to predict output or
result as follows:

Bais:-

Machine learning is a branch of Artificial Intelligence, which allows machines to perform data
analysis and make predictions. However, if the machine learning model is not accurate, it can
make predictions errors, and these prediction errors are usually known as Bias and Variance. In
machine learning, these errors will always be present as there is always a slight difference
between the model predictions and actual predictions. The main aim of ML/data science
analysts is to reduce these errors in order to get more accurate results. In this topic, we are
going to discuss bias and variance, Bias-variance trade-off, Underfitting and Overfitting. But
before starting, let's first understand what errors in Machine learning are?

cross-validation,:- Cross-validation is a technique for validating the model efficiency by


training it on the subset of input data and testing on previously unseen subset of the
input data. We can also say that it is a technique to check how a statistical model
generalizes to an independent dataset.

In machine learning

, there is always the need to test the stability of the model. It means based only on the training dataset;
we can't fit our model on the training dataset. For this purpose, we reserve a particular sample of the
dataset, which was not part of the training dataset. After that, we test our model on that sample before
deployment, and this complete process comes under cross-validation. This is something different from
the general train-test split.

Hence the basic steps of cross-validations are:

o Reserve a subset of the dataset as a validation set.


o Provide the training to the model using the training dataset.
o Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.

Methods used for Cross-Validation


There are some common methods that are used for cross-validation. These methods are
given below:

1. Validation Set Approach


2. Leave-P-out cross-validation
3. Leave one out cross-validation
4. K-fold cross-validation
5. Stratified k-fold cross-validation

Validation Set Approach


We divide our input dataset into a training set and test or validation set in the validation
set approach. Both the subsets are given 50% of the dataset.

But it has one of the big disadvantages that we are just using a 50% dataset to train our
model, so the model may miss out to capture important information of the dataset. It
also tends to give the underfitted model.

Leave-P-out cross-validation
In this approach, the p datasets are left out of the training data. It means, if there are
total n datapoints in the original input dataset, then n-p data points will be used as the
training dataset and the p data points as the validation set. This complete process is
repeated for all the samples, and the average error is calculated to know the
effectiveness of the model.

There is a disadvantage of this technique; that is, it can be computationally difficult for
the large p.

Leave one out cross-validation


This method is similar to the leave-p-out cross-validation, but instead of p, we need to
take 1 dataset out of training. It means, in this approach, for each learning set, only one
datapoint is reserved, and the remaining dataset is used to train the model. This process
repeats for each datapoint. Hence for n samples, we get n different training set and n
test set. It has the following features:

o In this approach, the bias is minimum as all the data points are used.
o The process is executed for n times; hence execution time is high.
o This approach leads to high variation in testing the effectiveness of the model as we
iteratively check against one data point.

K-Fold Cross-Validation
K-fold cross-validation approach divides the input dataset into K groups of samples of
equal sizes. These samples are called folds. For each learning set, the prediction
function uses k-1 folds, and the rest of the folds are used for the test set. This approach
is a very popular CV approach because it is easy to understand, and the output is less
biased than other methods.

The steps for k-fold cross-validation are:

o Split the input dataset into K groups


o For each group:
o Take one group as the reserve or test data set.
o Use remaining groups as the training dataset
o Fit the model on the training set and evaluate the performance of the model
using the test set.

Let's take an example of 5-folds cross-validation. So, the dataset is grouped into 5 folds.
On 1st iteration, the first fold is reserved for test the model, and rest are used to train the
model. On 2nd iteration, the second fold is used to test the model, and rest are used to
train the model. This process will continue until each fold is not used for the test fold.

Consider the below diagram:


Stratified k-fold cross-validation

This technique is similar to k-fold cross-validation with some little changes. This approach works on
stratification concept, it is a process of rearranging the data to ensure that each fold or group is a good
representative of the complete dataset. To deal with the bias and variance, it is one of the best
approaches.

It can be understood with an example of housing prices, such that the price of some houses can be much
high than other houses. To tackle such situations, a stratified k-fold cross-validation technique is useful.

Holdout Method

This method is the simplest cross-validation technique among all. In this method, we need to remove a
subset of the training data and use it to get prediction results by training it on the rest part of the dataset.

The error that occurs in this process tells how well our model will perform with the unknown dataset.
Although this approach is simple to perform, it still faces the issue of high variance, and it also produces
misleading results sometimes.

Comparison of Cross-validation to train/test split in Machine Learning

o Train/test split: The input data is divided into two parts, that are training set and test set on a
ratio of 70:30, 80:20, etc. It provides a high variance, which is one of the biggest disadvantages.

o Training Data: The training data is used to train the model, and the dependent variable
is known.

o Test Data: The test data is used to make the predictions from the model that is already
trained on the training data. This has the same features as training data but not the part
of that.

o Cross-Validation dataset: It is used to overcome the disadvantage of train/test split by splitting


the dataset into groups of train/test splits, and averaging the result. It can be used if we want to
optimize our model that has been trained on the training dataset for the best performance. It is
more efficient as compared to train/test split as every observation is used for the training and
testing both.

Limitations of Cross-Validation

There are some limitations of the cross-validation technique, which are given below:
o For the ideal conditions, it provides the optimum output. But for the inconsistent data, it may
produce a drastic result. So, it is one of the big disadvantages of cross-validation, as there is no
certainty of the type of data in machine learning.

o In predictive modeling, the data evolves over a period, due to which, it may face the differences
between the training set and validation sets. Such as if we create a model for the prediction of
stock market values, and the data is trained on the previous 5 years stock values, but the realistic
future values for the next 5 years may drastically different, so it is difficult to expect the correct
output for such situations.

Applications of Cross-Validation

o This technique can be used to compare the performance of different predictive modeling
methods.

o It has great scope in the medical research field.

o It can also be used for the meta-analysis, as it is already being used by the data scientists in the
field of medical statistics.

Dimensionality Reduction:- The number of input features, variables, or columns present in a given dataset is
known as dimensionality, and the process to reduce these features is called dimensionality reduction.

A dataset contains a huge number of input features in various cases, which makes the predictive modeling
task more complicated. Because it is very difficult to visualize or make predictions for the training dataset
with a high number of features, for such cases, dimensionality reduction techniques are required to use.

Dimensionality reduction technique can be defined as, "It is a way of converting the higher dimensions
dataset into lesser dimensions dataset ensuring that it provides similar information." These
techniques are widely used in machine learning

for obtaining a better fit predictive model while solving the classification and regression problems.

It is commonly used in the fields that deal with high-dimensional data, such as speech recognition,
signal processing, bioinformatics, etc. It can also be used for data visualization, noise reduction,
cluster analysis, etc.

Some benefits of applying dimensionality reduction technique to the given dataset are given below:

o By reducing the dimensions of the features, the space required to store the dataset also gets
reduced.
o Less Computation training time is required for reduced dimensions of features.

o Reduced dimensions of features of the dataset help in visualizing the data quickly.

o It removes the redundant features (if present) by taking care of multicollinearity.

Disadvantages of dimensionality Reduction

There are also some disadvantages of applying the dimensionality reduction, which are given below:

o Some data may be lost due to dimensionality reduction.

o In the PCA dimensionality reduction technique, sometimes the principal components required to
consider are unknown.

Approaches of Dimension Reduction

There are two ways to apply the dimension reduction technique, which are given below:

Feature Selection

Feature selection is the process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy. In other words, it is a way of
selecting the optimal features from the input dataset.

Three methods are used for the feature selection:

1. Filters Methods

In this method, the dataset is filtered, and a subset that contains only the relevant features is taken. Some
common techniques of filters method are:

o Correlation

o Chi-Square Test

o ANOVA

o Information Gain, etc.

2. Wrappers Methods

The wrapper method has the same goal as the filter method, but it takes a machine learning model for its
evaluation. In this method, some features are fed to the ML model, and evaluate the performance. The
performance decides whether to add those features or remove to increase the accuracy of the model. This
method is more accurate than the filtering method but complex to work. Some common techniques of
wrapper methods are:
o Forward Selection

o Backward Selection

o Bi-directional Elimination

3. Embedded Methods: Embedded methods check the different training iterations of the machine
learning model and evaluate the importance of each feature. Some common techniques of Embedded
methods are:

o LASSO

o Elastic Net

o Ridge Regression, etc.

Feature Extraction:

Feature extraction is the process of transforming the space containing many dimensions into space with
fewer dimensions. This approach is useful when we want to keep the whole information but use fewer
resources while processing the information.

Some common feature extraction techniques are:

1.Principal Component Analysis

2.Linear Discriminant Analysis

a. Kernel PCA

b. Quadratic Discriminant Analysis

Principal Component Analysis (PCA)

Principal Component Analysis is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling.

PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing the power allocation in various
communication channels.
Backward Feature Elimination

The backward feature elimination technique is mainly used while developing Linear Regression or Logistic
Regression model. Below steps are performed in this technique to reduce the dimensionality or in feature
selection:

o In this technique, firstly, all the n variables of the given dataset are taken to train the model.

o The performance of the model is checked.

o Now we will remove one feature each time and train the model on n-1 features for n times, and
will compute the performance of the model.

o We will check the variable that has made the smallest or no change in the performance of the
model, and then we will drop that variable or features; after that, we will be left with n-1 features.

o Repeat the complete process until no feature can be dropped.

In this technique, by selecting the optimum performance of the model and maximum tolerable error rate,
we can define the optimal number of features require for the machine learning algorithms.

Forward Feature Selection

Forward feature selection follows the inverse process of the backward elimination process. It means, in
this technique, we don't eliminate the feature; instead, we will find the best features that can produce the
highest increase in the performance of the model. Below steps are performed in this technique:

o We start with a single feature only, and progressively we will add each feature at a time.

o Here we will train the model on each feature separately.

o The feature with the best performance is selected.

o The process will be repeated until we get a significant increase in the performance of the model.

Missing Value Ratio

If a dataset has too many missing values, then we drop those variables as they do not carry much useful
information. To perform this, we can set a threshold level, and if a variable has missing values more than
that threshold, we will drop that variable. The higher the threshold value, the more efficient the reduction.

Low Variance Filter

As same as missing value ratio technique, data columns with some changes in the data have less
information. Therefore, we need to calculate the variance of each variable, and all data columns with
variance lower than a given threshold are dropped because low variance features will not affect the target
variable.
High Correlation Filter

High Correlation refers to the case when two variables carry approximately similar information. Due to this
factor, the performance of the model can be degraded. This correlation between the independent
numerical variable gives the calculated value of the correlation coefficient. If this value is higher than the
threshold value, we can remove one of the variables from the dataset. We can consider those variables or
features that show a high correlation with the target variable.

Random Forest

Random Forest is a popular and very useful feature selection algorithm in machine learning. This
algorithm contains an in-built feature importance package, so we do not need to program it separately. In
this technique, we need to generate a large set of trees against the target variable, and with the help of
usage statistics of each attribute, we need to find the subset of features.

Random forest algorithm takes only numerical variables, so we need to convert the input data into
numeric data using hot encoding.

Factor Analysis

Factor analysis is a technique in which each variable is kept within a group according to the correlation
with other variables, it means variables within a group can have a high correlation between themselves,
but they have a low correlation with variables of other groups.

We can understand it by an example, such as if we have two variables Income and spend. These two
variables have a high correlation, which means people with high income spends more, and vice versa. So,
such variables are put into a group, and that group is known as the factor. The number of these factors
will be reduced as compared to the original dimension of the dataset.

Auto-encoders

One of the popular methods of dimensionality reduction is auto-encoder, which is a type of ANN
or artificial neural network

, and its main aim is to copy the inputs to their outputs. In this, the input is compressed into latent-space
representation, and output is occurred using this representation. It has mainly two parts:

o Encoder: The function of the encoder is to compress the input to form the latent-space
representation.

o Decoder: The function of the decoder is to recreate the output from the latent-space
representation.
Subset selection :- growing number of machine learning problems involve finding subsets of
data points. Examples range from selecting subset of labeled or unlabeled data points, to subsets
of features or model parameters, to selecting subsets of pixels, key points, sentences etc. in
image segmentation, correspondence and summarization problems. The workshop would
encompass a wide variety of topics ranging from theoretical aspects of subset selection e.g.
coresets, submodularity, determinantal point processes, to several practical applications, {\em
e.g.}, time and energy efficient learning, learning under resource constraints, active learning,
human assisted learning, feature selection, model compression, feature induction, {\em etc.}
We believe that this workshop is very timely since, a) subset selection is naturally emerging and
has often been considered in isolation in many of the above applications, and b) by connecting
researchers working on both the theoretical and application domains above, we can foster a much
needed discussion on reusing a several technical innovations across these subareas and
applications. Furthermore, we would also like to connect researchers working on the theoretical
foundations of subset selection (in areas such as corsets and sub modularity) with researchers
working in applications (such as feature selection, active learning, data efficient learning, model
compression, and human assisted machine learning).
Shrinkage Methods in Linear Regression:- shrinkage methods (also known as
regularization) come in play. These methods apply a penalty term to the Loss function used in
the model. Minimizing the loss function is equal to maximizing the accuracy. To understand this
better, we need to go into the depths of Loss function in Linear Regression.Linear Regression
uses Least Squares to calculate the minimum error between the actual values and the predicted
values. The aim is to minimize the squared difference between the actual and predicted values to
draw the best possible regression curve for the best prediction accuracy.
Shrinking the coefficient estimates significantly reduces their variance. When we perform
shrinking, we essentially bring the coefficient estimates closer to 0.The need for shrinkage
method arises due to the issues of underfitting or overfitting the data. When we want to minimize
the mean error (Mean Squared Error(MSE) in case of Linear Regression),we need to optimize
the bias-variance trade-off.
The bias-variance trade-off indicates the level of underfitting or overfitting of the data with
respect to the Linear Regression model applied to it. A high bias-low variance means the model
is underfitted and a low bias-high variance means that the model is overfitted. We need to trade-
off between bias and variance to achieve the perfect combination for the minimum Mean
Squared Error as shown by the graph below.

In this figure, the green curve is variance, the black curve is squared bias and the purple curve is
the MSE. Lambda is the regularization parameter which will be covered later.
Used of shrinking methods:-

The best known shrinking methods are Ridge Regression and Lasso Regression which are often
used in place of Linear Regression.
Ridge Regression, like Linear Regression, aims to minimize the Residual Sum of Squares(RSS)
but with a slight change.
:We now know that there are better methods than simple Linear Regression in the form of Ridge
Regression and Lasso Regression which account for the underfitting and overfitting of data.

shrinkage/Regularization is one of the most important concepts of machine learning. It is a technique to


prevent the model from overfitting by adding extra information to it.Sometimes the machine
learning model performs well with the training data but does not perform well with the test data. It means
the model is not able to predict the output when deals with unseen data by introducing noise in the
output, and hence the model is called overfitted. This problem can be deal with the help of a
regularization technique.This technique can be used in such a way that it will allow to maintain all
variables or features in the model by reducing the magnitude of the variables. Hence, it maintains
accuracy as well as a generalization of the model.It mainly regularizes or reduces the coefficient of
features toward zero. In simple words, "In regularization technique, we reduce the magnitude of the
features by keeping the same number of features."

How does Regularization Work?:-Regularization works by adding a penalty or complexity term to the
complex model. Let's consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively. Here represents the bias
of the model, and b represents the intercept.Linear regression models try to optimize the β0 and b to
minimize the cost function. The equation for the cost function for the linear model is given below:

Now, we will add a loss function and optimize parameter to make the model that can predict the accurate
value of Y. The loss function for the linear regression is called as RSS or Residual sum of squares.

Techniques of Regularization

There are mainly two types of regularization techniques, which are given below:

o Ridge Regression

o Lasso Regression
Ridge Regression

o Ridge regression is one of the types of linear regression in which a small amount of bias is
introduced so that we can get better long-term predictions.

o Ridge regression is a regularization technique, which is used to reduce the complexity of the
model. It is also called as L2 regularization.

o In this technique, the cost function is altered by adding the penalty term to it. The amount of bias
added to the model is called Ridge Regression penalty. We can calculate it by multiplying with
the lambda to the squared weight of each individual feature.

o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the model, and hence ridge
regression reduces the amplitudes of the coefficients that decreases the complexity of the model.

o As we can see from the above equation, if the values of λ tend to zero, the equation becomes
the cost function of the linear regression model. Hence, for the minimum value of λ, the model
will resemble the linear regression model.

o A general linear or polynomial regression will fail if there is high collinearity between the
independent variables, so to solve such problems, Ridge regression can be used.

o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

o Lasso regression is another regularization technique to reduce the complexity of the model. It
stands for Least Absolute and Selection Operator.

o It is similar to the Ridge Regression except that the penalty term contains only the absolute
weights instead of a square of weights.

o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can
only shrink it near to 0.

o It is also called as L1 regularization. The equation for the cost function of Lasso regression will
be:
o Some of the features in this technique are completely neglected for model evaluation.

o Hence, the Lasso regression can help us to reduce the overfitting in the model as well as the
feature selection.

Key Difference between Ridge Regression and Lasso Regression

o Ridge regression is mostly used to reduce the overfitting in the model, and it includes all the
features present in the model. It reduces the complexity of the model by shrinking the
coefficients.

o Lasso regression helps to reduce the overfitting in the model as well as feature selection.

PARTIAL LEAST SQUARES REGRESSION (PLS)


Partial Least Squares regression (PLS) is often used when there are a lot of explanatory variables,
possibly correlated.

Partial Least Squares regression (PLS) is a quick, efficient and optimal regression method based on covariance. It
is recommended in cases of regression where the number of explanatory variables is high, and where it is likely that
there is multicollinearity among the variables, i.e. that the explanatory variables are correlated.

XLSTAT provides a complete PLS regression method to model and predict your data in excel. XLSTAT proposes
several standard and advanced options that will let you gain a deep insight on your data:

 Choose several response variables in one analysis


 Use the leave one out (LOO) cross validation option
 Automatically choose the number of components to be kept using one of multiple criteria or choose this
number manually
 Choose between the fast algorithm and the more precise one.

What is Partial Least Squares regression?

The Partial Least Squares regression (PLS) is a method which reduces the variables, used to predict, to a smaller
set of predictors. These predictors are then used to perfom a regression.

The idea behind the PLS regression is to create, starting from a table with n observations described by p variables, a
set of h components with the PLS 1 and PLS 2 algorithms

Some programs differentiate PLS 1 from PLS 2. PLS 1 corresponds to the case where there is only one dependent
variable. PLS 2 corresponds to the case where there are several dependent variables. The algorithms used by
XLSTAT are such that the PLS 1 is only a particular case of PLS 2.

Partial Least Squares regression model equations


In the case of the Ordinary Least Squares (OLS) and Principale Component Regression (PCR) methods, if
models need to be computed for several dependent variables, the computation of the models is simply a loop on the
columns of the dependent variables table Y. In the case of PLS regression, the covariance structure of Y also
influences the computations.

The equation of the PLS regression model writes:

Y = ThC’h + Eh = XWh*C’h + Eh = XWh (P’hWh)-1 C’h + Eh

where Y is the matrix of the dependent variables, X is the matrix of the explanatory variables. Th, Ch, W*h , Wh
and Ph, are the matrices generated by the PLS algorithm, and Eh is the matrix of the residuals.

The matrix B of the regression coefficients of Y on X, with h components generated by the PLS regression
algorithm is given by:

B = Wh(P’hWh)-1C’hNote: the PLS regression leads to a linear model as the OLS and PCR do.

PLS regression results: Correlation, observations charts and biplots

A great advantage of PLS regression over classic regression are the available charts that describe the data structure.
Thanks to the correlation and loading plots it is easy to study the relationship among the variables. It can be
relationships among the explanatory variables or dependent variables, as well as between explanatory and dependent
variables. The score plot gives information about sample proximity and dataset structure. The biplot gather all these
information in one chart.

Prediction with Partial Least Squares regression

PLS regression is also used to build predictive models. XLSTAT enables you to predict new samples' values.

General remarks about PLS regression

The three methods – Partial Least Squares regression (PLS), Principal Component regression (PCR), which is
based on Principal Component analysis (PCA), and Ordinary Least Squares regression (OLS), which is the
regular linear regression, - give the same results if the number of components obtained from the Principal
Component analysis (PCA) in the PCR, or from the PLS regression is equal to the number of explanatory variables.

What is the difference between PCR and PLS regression?


The components obtained from the PLS regression, which is based on covariance, are built so that they explain as
well as possible Y, while the components of the PCR are built to describe X as well as possible. This explains why
the PLS regression outperforms PCR when the target is strongly correlated with a direction in the data that have a
low variance. The XLSTAT-PLS software allows partly compensating this drawback of the PCR by allowing the
selection of the components that are the most correlated with Y.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy