0% found this document useful (0 votes)
163 views56 pages

MC4301 - ML Unit 4 (Parametric Machine Learning)

Uploaded by

Sarathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views56 pages

MC4301 - ML Unit 4 (Parametric Machine Learning)

Uploaded by

Sarathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

C. Abdul Hakeem College of Engineering & Technology


Department of Master of Computer Applications
MC4301 - Machine Learning
Unit 4
Parametric Machine Learning

Logistic Regression

What are the differences between supervised learning, unsupervised learning &
reinforcement learning?

Machine learning algorithms are broadly classified into three categories - supervised
learning, unsupervised learning, and reinforcement learning.

1. Supervised Learning - Learning where data is labeled and the motivation is


to classify something or predict a value. Example: Detecting fraudulent
transactions from a list of credit card transactions.
2. Unsupervised Learning - Learning where data is not labeled and the
motivation is to find patterns in given data. In this case, you are asking the
machine learning model to process the data from which you can then draw
conclusions. Example: Customer segmentation based on spend data.
3. Reinforcement Learning - Learning by trial and error. This is the closest to
how humans learn. The motivation is to find optimal policy of how to act in a
given environment. The machine learning model examines all possible actions,
makes a policy that maximizes benefit, and implements the policy(trial). If
there are errors from the initial policy, apply reinforcements back into the
algorithm and continue to do this until you reach the optimal policy. Example:
Personalized recommendations on streaming platforms like YouTube.

What are the two types of supervised learning?

As supervised learning is used to classify something or predict a value, naturally there


are two types of algorithms for supervised learning - classification models and
regression models.

1. Classification model - In simple terms, a classification model predicts


possible outcomes. Example: Predicting if a transaction is fraud or not.
2. Regression model - Are used to predict a numerical value. Example:
Predicting the sale price of a house.

What is logistic regression?

Logistic regression is an example of supervised learning. It is used to calculate or


predict the probability of a binary (yes/no) event occurring. An example of logistic
regression could be applying machine learning to determine if a person is likely to be
infected with COVID-19 or not. Since we have two possible outcomes to this question
- yes they are infected, or no they are not infected - this is called binary classification.

about:blank 1/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

In this imaginary example, the probability of a person being infected with COVID-19
could be based on the viral load and the symptoms and the presence of antibodies, etc.
Viral load, symptoms, and antibodies would be our factors (Independent Variables),
which would influence our outcome (Dependent Variable).

How is logistic regression different from linear regression?

In linear regression, the outcome is continuous and can be any possible value.
However in the case of logistic regression, the predicted outcome is discrete and
restricted to a limited number of values.

For example, say we are trying to apply machine learning to the sale of a house. If we
are trying to predict the sale price based on the size, year built, and number of stories
we would use linear regression, as linear regression can predict a sale price of any
possible value. If we are using those same factors to predict if the house sells or not,
we would logistic regression as the possible outcomes here are restricted to yes or no.

Hence, linear regression is an example of a regression model and logistic regression is


an example of a classification model.

Where to use logistic regression

Logistic regression is used to solve classification problems, and the most common use
case is binary logistic regression, where the outcome is binary (yes or no). In the real
world, you can see logistic regression applied across multiple areas and fields.

 In health care, logistic regression can be used to predict if a tumor is likely to


be benign or malignant.
 In the financial industry, logistic regression can be used to predict if a
transaction is fraudulent or not.
 In marketing, logistic regression can be used to predict if a targeted audience
will respond or not.

Are there other use cases for logistic regression aside from binary logistic regression?
Yes. There are two other types of logistic regression that depend on the number of
predicted outcomes.

The three types of logistic regression

1. Binary logistic regression - When we have two possible outcomes, like our
original example of whether a person is likely to be infected with COVID-19
or not.
2. Multinomial logistic regression - When we have multiple outcomes, say if
we build out our original example to predict whether someone may have the
flu, an allergy, a cold, or COVID-19.
3. Ordinal logistic regression - When the outcome is ordered, like if we build
out our original example to also help determine the severity of a COVID-19
infection, sorting it into mild, moderate, and severe cases.

about:blank 2/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Training data assumptions for logistic regression

Training data that satisfies the below assumptions is usually a good fit for logistic
regression.

 The predicted outcome is strictly binary or dichotomous. (This applies to


binary logistic regression).
 The factors, or the independent variables, that influence the outcome are
independent of each other. In other words there is little or no multicollinearity
among the independent variables.
 The independent variables can be linearly related to the log odds.
 Fairly large sample sizes.

If your training data does not satisfy the above assumptions, logistic regression may
not work for your use case.

Mathematics behind logistic regression

Probability always ranges between 0 (does not happen) and 1 (happens). Using our
Covid-19 example, in the case of binary classification, the probability of testing
positive and not testing positive will sum up to 1. We use logistic function or sigmoid
function to calculate probability in logistic regression. The logistic function is a
simple S-shaped curve used to convert data into a value between 0 and 1.

Classification and representation

What is Supervised Learning?

In Supervised Learning, the model learns by example. Along with our input variable,
we also give our model the corresponding correct labels. While training, the model
gets to look at which label corresponds to our data and hence can find patterns
between our data and those labels.

Some examples of Supervised Learning include:

1. It classifies spam Detection by teaching a model of what mail is spam and not
spam.
2. Speech recognition where you teach a machine to recognize your voice.

about:blank 3/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

3. Object Recognition by showing a machine what an object looks like and


having it pick that object from among other objects.

We can further divide Supervised Learning into the following:

Figure 1: Supervised Learning Subdivisions

What is Classification?

Classification is defined as the process of recognition, understanding, and grouping of


objects and ideas into preset categories a.k.a “sub-populations.” With the help of these
pre-categorized training datasets, classification in machine learning programs
leverage a wide range of algorithms to classify future datasets into respective and
relevant categories.

Classification algorithms used in machine learning utilize input training data for the
purpose of predicting the likelihood or probability that the data that follows will fall
into one of the predetermined categories. One of the most common applications of
classification is for filtering emails into “spam” or “non-spam”, as used by today’s top
email service providers.

In short, classification is a form of “pattern recognition,”. Here, classification


algorithms applied to the training data find the same pattern (similar number
sequences, words or sentiments, and the like) in future data sets.

We will explore classification algorithms in detail, and discover how a text analysis
software can perform actions like sentiment analysis - used for categorizing
unstructured text by opinion polarity (positive, negative, neutral, and the like).

Figure 2: Classification of vegetables and groceries

about:blank 4/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

What is Classification Algorithm?

Based on training data, the Classification algorithm is a Supervised Learning


technique used to categorize new observations. In classification, a program uses the
dataset or observations provided to learn how to categorize new observations into
various classes or groups. For instance, 0 or 1, red or blue, yes or no, spam or not
spam, etc. Targets, labels, or categories can all be used to describe classes. The
Classification algorithm uses labeled input data because it is a supervised learning
technique and comprises input and output information. A discrete output function (y)
is transferred to an input variable in the classification process (x).

In simple words, classification is a type of pattern recognition in which classification


algorithms are performed on training data to discover the same pattern in new data
sets.

Learners in Classification Problems

There are two types of learners.

Lazy Learners

It first stores the training dataset before waiting for the test dataset to arrive. When
using a lazy learner, the classification is carried out using the training dataset's most
appropriate data. Less time is spent on training, but more time is spent on predictions.
Some of the examples are case-based reasoning and the KNN algorithm.

Eager Learners

Before obtaining a test dataset, eager learners build a classification model using a
training dataset. They spend more time studying and less time predicting. Some of the
examples are ANN, naive Bayes, and Decision trees.

4 Types Of Classification Tasks In Machine Learning

Before diving into the four types of Classification Tasks in Machine Learning, let us
first discuss Classification Predictive Modeling.

Classification Predictive Modeling

A classification problem in machine learning is one in which a class label is


anticipated for a specific example of input data.

Problems with categorization include the following:

 Give an example and indicate whether it is spam or not.


 Identify a handwritten character as one of the recognized characters.
 Determine whether to label the current user behavior as churn.

A training dataset with numerous examples of inputs and outputs is necessary for
classification from a modeling standpoint.

about:blank 5/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

A model will determine the optimal way to map samples of input data to certain class
labels using the training dataset. The training dataset must therefore contain a large
number of samples of each class label and be suitably representative of the problem.

When providing class labels to a modeling algorithm, string values like "spam" or
"not spam" must first be converted to numeric values. Label encoding, which is
frequently used, assigns a distinct integer to every class label, such as "spam" = 0, "no
spam," = 1.

There are numerous varieties of algorithms for classification in modeling problems,


including predictive modeling and classification.

It is typically advised that a practitioner undertake controlled tests to determine what


algorithm and algorithm configuration produces the greatest performance for a certain
classification task because there is no strong theory on how to map algorithms onto
issue types.

Based on their output, classification predictive modeling algorithms are assessed. A


common statistic for assessing a model's performance based on projected class labels
is classification accuracy. Although not perfect, classification accuracy is a reasonable
place to start for many classification jobs.

Some tasks may call for a class membership probability prediction for each example
rather than class labels. This adds more uncertainty to the prediction, which a user or
application can subsequently interpret. The ROC Curve is a well-liked diagnostic for
assessing anticipated probabilities.

There are four different types of Classification Tasks in Machine Learning and they
are following -

 Binary Classification
 Multi-Class Classification
 Multi-Label Classification
 Imbalanced Classification

Binary Classification

Those classification jobs with only two class labels are referred to as binary
classification.

Examples comprise -

 Prediction of conversion (buy or not).


 Churn forecast (churn or not).
 Detection of spam email (spam or not).

Binary classification problems often require two classes, one representing the normal
state and the other representing the aberrant state.

about:blank 6/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

For instance, the normal condition is "not spam," while the abnormal state is "spam."
Another illustration is when a task involving a medical test has a normal condition of
"cancer not identified" and an abnormal state of "cancer detected."

Class label 0 is given to the class in the normal state, whereas class label 1 is given to
the class in the abnormal condition.

A model that forecasts a Bernoulli probability distribution for each case is frequently
used to represent a binary classification task.

The discrete probability distribution known as the Bernoulli distribution deals with
the situation where an event has a binary result of either 0 or 1. In terms of
classification, this indicates that the model forecasts the likelihood that an example
would fall within class 1, or the abnormal state.

The following are well-known binary classification algorithms:

 Logistic Regression
 Support Vector Machines
 Simple Bayes
 Decision Trees

Some algorithms, such as Support Vector Machines and Logistic Regression, were
created expressly for binary classification and do not by default support more than
two classes.

Multi-Class Classification

Multi-class labels are used in classification tasks referred to as multi-class


classification.

Examples comprise -

 Categorization of faces.
 Classifying plant species.
 Character recognition using optical.

The multi-class classification does not have the idea of normal and abnormal
outcomes, in contrast to binary classification. Instead, instances are grouped into one
of several well-known classes.

In some cases, the number of class labels could be rather high. In a facial recognition
system, for instance, a model might predict that a shot belongs to one of thousands or
tens of thousands of faces.

Text translation models and other problems involving word prediction could be
categorized as a particular case of multi-class classification. Each word in the
sequence of words to be predicted requires a multi-class classification, where the
vocabulary size determines the number of possible classes that may be predicted and
may range from tens of thousands to hundreds of thousands of words.

about:blank 7/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Multiclass classification tasks are frequently modeled using a model that forecasts a
Multinoulli probability distribution for each example.

An event that has a categorical outcome, such as K in 1, 2, 3,..., K, is covered by the


Multinoulli distribution, which is a discrete probability distribution. In terms of
classification, this implies that the model forecasts the likelihood that a given example
will belong to a certain class label.

For multi-class classification, many binary classification techniques are applicable.

The following well-known algorithms can be used for multi-class classification:

 Progressive Boosting
 Choice trees
 Nearest K Neighbors
 Rough Forest
 Simple Bayes

Multi-class problems can be solved using algorithms created for binary classification.

In order to do this, a method is known as "one-vs-rest" or "one model for each pair of
classes" is used, which includes fitting multiple binary classification models with each
class versus all other classes (called one-vs-one).

 One-vs-One: For each pair of classes, fit a single binary classification model.

The following binary classification algorithms can apply these multi-class


classification techniques:

 One-vs-Rest: Fit a single binary classification model for each class versus all
other classes.

The following binary classification algorithms can apply these multi-class


classification techniques:

 Support vector Machine


 Logistic Regression

Multi-Label Classification

Multi-label classification problems are those that feature two or more class labels and
allow for the prediction of one or more class labels for each example.

Think about the photo classification example. Here a model can predict the existence
of many known things in a photo, such as “person”, “apple”, "bicycle," etc. A
particular photo may have multiple objects in the scene.

This greatly contrasts with multi-class classification and binary classification, which
anticipate a single class label for each occurrence.

about:blank 8/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Multi-label classification problems are frequently modeled using a model that


forecasts many outcomes, with each outcome being forecast as a Bernoulli probability
distribution. In essence, this approach predicts several binary classifications for each
example.

It is not possible to directly apply multi-label classification methods used for


multi-class or binary classification. The so-called multi-label versions of the
algorithms, which are specialized versions of the conventional classification
algorithms, include:

 Multi-label Gradient Boosting


 Multi-label Random Forests
 Multi-label Decision Trees

Another strategy is to forecast the class labels using a different classification


algorithm.

Imbalanced Classification

The term "imbalanced classification" describes classification jobs where the


distribution of examples within each class is not equal.

A majority of the training dataset's instances belong to the normal class, while a
minority belong to the abnormal class, making imbalanced classification tasks binary
classification tasks in general.

Examples comprise -

 Clinical diagnostic procedures


 Detection of outliers
 Fraud investigation

Although they could need unique methods, these issues are modeled as binary
classification jobs.

By oversampling the minority class or undersampling the majority class, specialized


strategies can be employed to alter the sample composition in the training dataset.

Examples comprise -

 SMOTE Oversampling
 Random Undersampling

It is possible to utilize specialized modeling techniques, like the cost-sensitive


machine learning algorithms, that give the minority class more consideration when
fitting the model to the training dataset.

Examples comprise:

 Cost-sensitive Support Vector Machines

about:blank 9/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

 Cost-sensitive Decision Trees


 Cost-sensitive Logistic Regression

Since reporting the classification accuracy may be deceptive, alternate performance


indicators may be necessary.

Examples comprise -

 F-Measure
 Recall
 Precision

Types of Classification Algorithms

You can apply many different classification methods based on the dataset you are
working with. It is so because the study of classification in statistics is extensive. The
top five machine learning algorithms are listed below.

1. Logistic Regression

It is a supervised learning classification technique that forecasts the likelihood of a


target variable. There will only be a choice between two classes. Data can be coded as
either one or yes, representing success, or as 0 or no, representing failure. The
dependent variable can be predicted most effectively using logistic regression. When
the forecast is categorical, such as true or false, yes or no, or a 0 or 1, you can use it.
A logistic regression technique can be used to determine whether or not an email is a
spam.

2. Naive Bayes

Naive Bayes determines whether a data point falls into a particular category. It can be
used to classify phrases or words in text analysis as either falling within a
predetermined classification or not.

Text Tag
“A great game” Sports
“The election is over” Not Sports
“What a great score” Sports
“A clean and unforgettable game” Sports
“The spelling bee winner was a surprise” Not Sports

3. K-Nearest Neighbors

It calculates the likelihood that a data point will join the groups based on which group
the data points closest to it are a part of. When using k-NN for classification, you
determine how to classify the data according to its nearest neighbor.

10

about:blank 10/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

4. Decision Tree

A decision tree is an example of supervised learning. Although it can solve regression


and classification problems, it excels in classification problems. Similar to a flow
chart, it divides data points into two similar groups at a time, starting with the "tree
trunk" and moving through the "branches" and "leaves" until the categories are more
closely related to one another.

5. Random Forest Algorithm

The random forest algorithm is an extension of the Decision Tree algorithm where
you first create a number of decision trees using training data and then fit your new
data into one of the created ‘tree’ as a ‘random forest’. It averages the data to connect
it to the nearest tree data based on the data scale. These models are great for
improving the decision tree’s problem of forcing data points unnecessarily within a
category.

6. Support Vector Machine

Support Vector Machine is a popular supervised machine learning technique for


classification and regression problems. It goes beyond X/Y prediction by using
algorithms to classify and train the data according to polarity.

Types of ML Classification Algorithms

1. Supervised Learning Approach

The supervised learning approach explicitly trains algorithms under close human
supervision. Both the input and the output data are first provided to the algorithm. The
algorithm then develops rules that map the input to the output. The training procedure
is repeated as soon as the highest level of performance is attained.

The two types of supervised learning approaches are:

 Regression
 Classification

2. Unsupervised Learning

This approach is applied to examine data's inherent structure and derive insightful
information from it. This technique looks for insights that can produce better results
by looking for patterns and insights in unlabeled data.

There are two types of unsupervised learning:

 Clustering
 Dimensionality reduction

11

about:blank 11/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

3. Semi-supervised Learning

Semi-supervised learning lies on the spectrum between unsupervised and supervised


learning. It combines the most significant aspects of both worlds to provide a unique
set of algorithms.

4. Reinforcement Learning

The goal of reinforcement learning is to create autonomous, self-improving


algorithms. The algorithm's goal is to improve itself through a continual cycle of trials
and errors based on the interactions and combinations between the incoming and
labeled data.

Classification Models

 Naive Bayes: Naive Bayes is a classification algorithm that assumes that


predictors in a dataset are independent. This means that it assumes the features
are unrelated to each other. For example, if given a banana, the classifier will
see that the fruit is of yellow color, oblong-shaped and long and tapered. All of
these features will contribute independently to the probability of it being a
banana and are not dependent on each other.

 Decision Trees: A Decision Tree is an algorithm that is used to visually


represent decision-making. A Decision Tree can be made by asking a yes/no
question and splitting the answer to lead to another decision. The question is at
the node and it places the resulting decisions below at the leaves. The tree
depicted below is used to decide if we can play tennis.

Figure 4: Decision Tree

In the above figure, depending on the weather conditions and the humidity and wind,
we can systematically decide if we should play tennis or not. In decision trees, all the
False statements lie on the left of the tree and the True statements branch off to the
right. Knowing this, we can make a tree which has the features at the nodes and the
resulting classes at the leaves.

 K-Nearest Neighbors: K-Nearest Neighbor is a classification and prediction


algorithm that is used to divide data into classes based on the distance between

12

about:blank 12/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

the data points. K-Nearest Neighbor assumes that data points which are close
to one another must be similar and hence, the data point to be classified will be
grouped with the closest cluster.

Figure 5: Data to be classified

Figure 6: Classification using K-Nearest


Neighbours

Evaluating a Classification Model

After our model is finished, we must assess its performance to determine whether it is
a regression or classification model. So, we have the following options for assessing a
classification model:

1. Confusion Matrix

 The confusion matrix describes the model performance and gives us a matrix
or table as an output.
 The error matrix is another name for it.
 The matrix is made up of the results of the forecasts in a condensed manner,
together with the total number of right and wrong guesses.

The matrix appears in the following table:

Actual Positive Actual Negative


Predicted Positive True Positive False Positive

13

about:blank 13/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Predicted Negative False Negative True Negative

Accuracy = (TP+TN)/Total Population

2. Log Loss or Cross-Entropy Loss

 It is used to assess a classifier's performance, and the output is a probability


value between 1 and 0.
 A successful binary classification model should have a log loss value that is
close to 0.
 If the anticipated value differs from the actual value, the value of log loss
rises.
 The lower log loss shows the model’s higher accuracy.

Cross-entropy for binary classification can be calculated as:

(ylog(p)+(1?y)log(1?p))

Where p = Predicted Output, y = Actual output.

3. AUC-ROC Curve

 AUC is for Area Under the Curve, and ROC refers to Receiver Operating
Characteristics Curve.
 It is a graph that displays the classification model's performance at various
thresholds.
 The AUC-ROC Curve is used to show how well the multi-class classification
model performs.
 The TPR and FPR are used to draw the ROC curve, with the True Positive
Rate (TPR) on the Y-axis and the FPR (False Positive Rate) on the X-axis.

Use Cases Of Classification Algorithms

There are many applications for classification algorithms. Here are a few of them

 Speech Recognition
 Detecting Spam Emails
 Categorization of Drugs
 Cancer Tumor Cell Identification
 Biometric Authentication, etc.

Representation

A machine learning model can't directly see, hear, or sense input examples. Instead,
you must create a representation of the data to provide the model with a useful
vantage point into the data's key qualities. That is, in order to train a model, you must
choose the set of features that best represent the data.

The choice of representation has an enormous effect on the performance of machine


learning algorithms.

14

about:blank 14/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

In the context of neural networks, Chollet says that layers extract representations.
The core building block of neural networks is the layer, a data-processing module that
you can think of as a filter for data. Some data goes in, and it comes out in a more
useful form. Specifically, layers extract representations out of the data fed into
them--hopefully, representations that are more meaningful for the problem at hand.
Most of deep learning consists of chaining together simple layers that will implement
a form of progressive data distillation. A deep-learning model is like a sieve for data
processing, made of a succession of increasingly refined data filters--the layers.

That makes me think that representations are the form that the training/test data takes
as it is progressively transformed. e.g. words could initially be represented as dense or
sparse (one-hot encoded) vectors. And then their representation changes one or more
times as they are fed into a model.

Mitchell says that we need to choose a representation for the target function.

Now that we have specified the ideal target function V , we must


choose a representation that the learning program will use to describe
the function V^ that it will learn.

This makes me think that the 'representation' could be described as the architecture of
the model, or maybe a mathematical description of the model. With this definition, we
don't know the true representation (equation) of the target function (if we did we
would have nothing to learn). So it is our task to decide what equation we want to use
to best approximate the target function.

Cost function

Machine Learning models require a high level of accuracy to work in the actual world.
But how do you calculate how wrong or right your model is? This is where the cost
function comes into the picture. A machine learning parameter that is used for
correctly judging the model, cost functions are important to understand to know how
well the model has estimated the relationship between your input and output
parameters.

What Is Cost Function in Machine Learning?

After training your model, you need to see how well your model is performing. While
accuracy functions tell you how well the model is performing, they do not provide
you with an insight on how to better improve them. Hence, you need a correctional
function that can help you compute when the model is the most accurate, as you need
to hit that small spot between an undertrained model and an overtrained model.

A Cost Function is used to measure just how wrong the model is in finding a relation
between the input and output. It tells you how badly your model is
behaving/predicting

Consider a robot trained to stack boxes in a factory. The robot might have to consider
certain changeable parameters, called Variables, which influence how it performs.

15

about:blank 15/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Let’s say the robot comes across an obstacle, like a rock. The robot might bump into
the rock and realize that it is not the correct action.

It will learn from this, and next time it will learn to avoid rocks. Hence, your machine
uses variables to better fit the data. The outcome of all these obstacles will further
optimize the robot and help it perform better. It will generalize and learn to avoid
obstacles in general, say like a fire that might have broken out. The outcome acts as a
cost function, which helps you optimize the variable, to get the best variables and fit
for the model.

Figure 1: Robot learning to avoid obstacles

What Is Gradient Descent?

Gradient Descent is an algorithm that is used to optimize the cost function or the error
of the model. It is used to find the minimum value of error possible in your model.

Gradient Descent can be thought of as the direction you have to take to reach the least
possible error. The error in your model can be different at different points, and you
have to find the quickest way to minimize it, to prevent resource wastage.

Gradient Descent can be visualized as a ball rolling down a hill. Here, the ball will
roll to the lowest point on the hill. It can take this point as the point where the error is
least as for any model, the error will be minimum at one point and will increase again
after that.

In gradient descent, you find the error in your model for different values of input
variables. This is repeated, and soon you see that the error values keep getting smaller
and smaller. Soon you’ll arrive at the values for variables when the error is the least,
and the cost function is optimized.

Figure 2: Gradient Descent

16

about:blank 16/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

What Is the Cost Function For Linear Regression?

A Linear Regression model uses a straight line to fit the model. This is done using the
equation for a straight line as shown :

Figure 3: Linear regression function

In the equation, you can see that two entities can have changeable values (variable) a,
which is the point at which the line intercepts the x-axis, and b, which is how steep
the line will be, or slope.

At first, if the variables are not properly optimized, you get a line that might not
properly fit the model. As you optimize the values of the model, for some variables,
you will get the perfect fit. The perfect fit will be a straight line running through most
of the data points while ignoring the noise and outliers. A properly fit Linear
Regression model looks as shown below :

Figure 4: Linear regression graph

For the Linear regression model, the cost function will be the minimum of the Root
Mean Squared Error of the model, obtained by subtracting the predicted values from
actual values. The cost function will be the minimum of these error values.

Figure 5: Linear regression cost function

By the definition of gradient descent, you have to find the direction in which the error
decreases constantly. This can be done by finding the difference between errors. The
small difference between errors can be obtained by differentiating the cost function
and subtracting it from the previous gradient descent to move down the slope.

17

about:blank 17/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 18/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 19/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 20/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 21/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 22/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 23/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 24/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 25/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 26/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 27/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 28/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 29/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 30/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 31/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 32/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 33/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 34/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 35/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 36/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 37/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 38/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 39/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 40/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 41/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 42/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 43/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 44/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 45/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 46/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 47/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 48/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 49/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 50/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 51/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 52/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 53/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 54/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 55/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 56/56

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy