0% found this document useful (0 votes)

163 views56 pages

MC4301 - ML Unit 4 (Parametric Machine Learning)

Uploaded by

Sarathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

163 views56 pages

MC4301 - ML Unit 4 (Parametric Machine Learning)

Uploaded by

Sarathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

C. Abdul Hakeem College of Engineering & Technology

Department of Master of Computer Applications
MC4301 - Machine Learning
Unit 4
Parametric Machine Learning

Logistic Regression

What are the differences between supervised learning, unsupervised learning &
reinforcement learning?

Machine learning algorithms are broadly classified into three categories - supervised
learning, unsupervised learning, and reinforcement learning.

1. Supervised Learning - Learning where data is labeled and the motivation is

to classify something or predict a value. Example: Detecting fraudulent
transactions from a list of credit card transactions.
2. Unsupervised Learning - Learning where data is not labeled and the
motivation is to find patterns in given data. In this case, you are asking the
machine learning model to process the data from which you can then draw
conclusions. Example: Customer segmentation based on spend data.
3. Reinforcement Learning - Learning by trial and error. This is the closest to
how humans learn. The motivation is to find optimal policy of how to act in a
given environment. The machine learning model examines all possible actions,
makes a policy that maximizes benefit, and implements the policy(trial). If
there are errors from the initial policy, apply reinforcements back into the
algorithm and continue to do this until you reach the optimal policy. Example:
Personalized recommendations on streaming platforms like YouTube.

What are the two types of supervised learning?

As supervised learning is used to classify something or predict a value, naturally there

are two types of algorithms for supervised learning - classification models and
regression models.

1. Classification model - In simple terms, a classification model predicts

possible outcomes. Example: Predicting if a transaction is fraud or not.
2. Regression model - Are used to predict a numerical value. Example:
Predicting the sale price of a house.

What is logistic regression?

Logistic regression is an example of supervised learning. It is used to calculate or

predict the probability of a binary (yes/no) event occurring. An example of logistic
regression could be applying machine learning to determine if a person is likely to be
infected with COVID-19 or not. Since we have two possible outcomes to this question
- yes they are infected, or no they are not infected - this is called binary classification.

about:blank 1/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

In this imaginary example, the probability of a person being infected with COVID-19
could be based on the viral load and the symptoms and the presence of antibodies, etc.
Viral load, symptoms, and antibodies would be our factors (Independent Variables),
which would influence our outcome (Dependent Variable).

How is logistic regression different from linear regression?

In linear regression, the outcome is continuous and can be any possible value.
However in the case of logistic regression, the predicted outcome is discrete and
restricted to a limited number of values.

For example, say we are trying to apply machine learning to the sale of a house. If we
are trying to predict the sale price based on the size, year built, and number of stories
we would use linear regression, as linear regression can predict a sale price of any
possible value. If we are using those same factors to predict if the house sells or not,
we would logistic regression as the possible outcomes here are restricted to yes or no.

Hence, linear regression is an example of a regression model and logistic regression is

an example of a classification model.

Where to use logistic regression

Logistic regression is used to solve classification problems, and the most common use
case is binary logistic regression, where the outcome is binary (yes or no). In the real
world, you can see logistic regression applied across multiple areas and fields.

 In health care, logistic regression can be used to predict if a tumor is likely to

be benign or malignant.
 In the financial industry, logistic regression can be used to predict if a
transaction is fraudulent or not.
 In marketing, logistic regression can be used to predict if a targeted audience
will respond or not.

Are there other use cases for logistic regression aside from binary logistic regression?
Yes. There are two other types of logistic regression that depend on the number of
predicted outcomes.

The three types of logistic regression

1. Binary logistic regression - When we have two possible outcomes, like our
original example of whether a person is likely to be infected with COVID-19
or not.
2. Multinomial logistic regression - When we have multiple outcomes, say if
we build out our original example to predict whether someone may have the
flu, an allergy, a cold, or COVID-19.
3. Ordinal logistic regression - When the outcome is ordered, like if we build
out our original example to also help determine the severity of a COVID-19
infection, sorting it into mild, moderate, and severe cases.

about:blank 2/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Training data assumptions for logistic regression

Training data that satisfies the below assumptions is usually a good fit for logistic
regression.

 The predicted outcome is strictly binary or dichotomous. (This applies to

binary logistic regression).
 The factors, or the independent variables, that influence the outcome are
independent of each other. In other words there is little or no multicollinearity
among the independent variables.
 The independent variables can be linearly related to the log odds.
 Fairly large sample sizes.

If your training data does not satisfy the above assumptions, logistic regression may
not work for your use case.

Mathematics behind logistic regression

Probability always ranges between 0 (does not happen) and 1 (happens). Using our
Covid-19 example, in the case of binary classification, the probability of testing
positive and not testing positive will sum up to 1. We use logistic function or sigmoid
function to calculate probability in logistic regression. The logistic function is a
simple S-shaped curve used to convert data into a value between 0 and 1.

Classification and representation

What is Supervised Learning?

In Supervised Learning, the model learns by example. Along with our input variable,
we also give our model the corresponding correct labels. While training, the model
gets to look at which label corresponds to our data and hence can find patterns
between our data and those labels.

Some examples of Supervised Learning include:

1. It classifies spam Detection by teaching a model of what mail is spam and not
spam.
2. Speech recognition where you teach a machine to recognize your voice.

about:blank 3/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

3. Object Recognition by showing a machine what an object looks like and

having it pick that object from among other objects.

We can further divide Supervised Learning into the following:

Figure 1: Supervised Learning Subdivisions

What is Classification?

Classification is defined as the process of recognition, understanding, and grouping of

objects and ideas into preset categories a.k.a “sub-populations.” With the help of these
pre-categorized training datasets, classification in machine learning programs
leverage a wide range of algorithms to classify future datasets into respective and
relevant categories.

Classification algorithms used in machine learning utilize input training data for the
purpose of predicting the likelihood or probability that the data that follows will fall
into one of the predetermined categories. One of the most common applications of
classification is for filtering emails into “spam” or “non-spam”, as used by today’s top
email service providers.

In short, classification is a form of “pattern recognition,”. Here, classification

algorithms applied to the training data find the same pattern (similar number
sequences, words or sentiments, and the like) in future data sets.

We will explore classification algorithms in detail, and discover how a text analysis
software can perform actions like sentiment analysis - used for categorizing
unstructured text by opinion polarity (positive, negative, neutral, and the like).

Figure 2: Classification of vegetables and groceries

about:blank 4/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

What is Classification Algorithm?

Based on training data, the Classification algorithm is a Supervised Learning

technique used to categorize new observations. In classification, a program uses the
dataset or observations provided to learn how to categorize new observations into
various classes or groups. For instance, 0 or 1, red or blue, yes or no, spam or not
spam, etc. Targets, labels, or categories can all be used to describe classes. The
Classification algorithm uses labeled input data because it is a supervised learning
technique and comprises input and output information. A discrete output function (y)
is transferred to an input variable in the classification process (x).

In simple words, classification is a type of pattern recognition in which classification

algorithms are performed on training data to discover the same pattern in new data
sets.

Learners in Classification Problems

There are two types of learners.

Lazy Learners

It first stores the training dataset before waiting for the test dataset to arrive. When
using a lazy learner, the classification is carried out using the training dataset's most
appropriate data. Less time is spent on training, but more time is spent on predictions.
Some of the examples are case-based reasoning and the KNN algorithm.

Eager Learners

Before obtaining a test dataset, eager learners build a classification model using a
training dataset. They spend more time studying and less time predicting. Some of the
examples are ANN, naive Bayes, and Decision trees.

4 Types Of Classification Tasks In Machine Learning

Before diving into the four types of Classification Tasks in Machine Learning, let us
first discuss Classification Predictive Modeling.

Classification Predictive Modeling

A classification problem in machine learning is one in which a class label is

anticipated for a specific example of input data.

Problems with categorization include the following:

 Give an example and indicate whether it is spam or not.

 Identify a handwritten character as one of the recognized characters.
 Determine whether to label the current user behavior as churn.

A training dataset with numerous examples of inputs and outputs is necessary for
classification from a modeling standpoint.

about:blank 5/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

A model will determine the optimal way to map samples of input data to certain class
labels using the training dataset. The training dataset must therefore contain a large
number of samples of each class label and be suitably representative of the problem.

When providing class labels to a modeling algorithm, string values like "spam" or
"not spam" must first be converted to numeric values. Label encoding, which is
frequently used, assigns a distinct integer to every class label, such as "spam" = 0, "no
spam," = 1.

There are numerous varieties of algorithms for classification in modeling problems,

including predictive modeling and classification.

It is typically advised that a practitioner undertake controlled tests to determine what

algorithm and algorithm configuration produces the greatest performance for a certain
classification task because there is no strong theory on how to map algorithms onto
issue types.

Based on their output, classification predictive modeling algorithms are assessed. A

common statistic for assessing a model's performance based on projected class labels
is classification accuracy. Although not perfect, classification accuracy is a reasonable
place to start for many classification jobs.

Some tasks may call for a class membership probability prediction for each example
rather than class labels. This adds more uncertainty to the prediction, which a user or
application can subsequently interpret. The ROC Curve is a well-liked diagnostic for
assessing anticipated probabilities.

There are four different types of Classification Tasks in Machine Learning and they
are following -

 Binary Classification
 Multi-Class Classification
 Multi-Label Classification
 Imbalanced Classification

Binary Classification

Those classification jobs with only two class labels are referred to as binary
classification.

Examples comprise -

 Prediction of conversion (buy or not).

 Churn forecast (churn or not).
 Detection of spam email (spam or not).

Binary classification problems often require two classes, one representing the normal
state and the other representing the aberrant state.

about:blank 6/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

For instance, the normal condition is "not spam," while the abnormal state is "spam."
Another illustration is when a task involving a medical test has a normal condition of
"cancer not identified" and an abnormal state of "cancer detected."

Class label 0 is given to the class in the normal state, whereas class label 1 is given to
the class in the abnormal condition.

A model that forecasts a Bernoulli probability distribution for each case is frequently
used to represent a binary classification task.

The discrete probability distribution known as the Bernoulli distribution deals with
the situation where an event has a binary result of either 0 or 1. In terms of
classification, this indicates that the model forecasts the likelihood that an example
would fall within class 1, or the abnormal state.

The following are well-known binary classification algorithms:

 Logistic Regression
 Support Vector Machines
 Simple Bayes
 Decision Trees

Some algorithms, such as Support Vector Machines and Logistic Regression, were
created expressly for binary classification and do not by default support more than
two classes.

Multi-Class Classification

Multi-class labels are used in classification tasks referred to as multi-class

classification.

Examples comprise -

 Categorization of faces.
 Classifying plant species.
 Character recognition using optical.

The multi-class classification does not have the idea of normal and abnormal
outcomes, in contrast to binary classification. Instead, instances are grouped into one
of several well-known classes.

In some cases, the number of class labels could be rather high. In a facial recognition
system, for instance, a model might predict that a shot belongs to one of thousands or
tens of thousands of faces.

Text translation models and other problems involving word prediction could be
categorized as a particular case of multi-class classification. Each word in the
sequence of words to be predicted requires a multi-class classification, where the
vocabulary size determines the number of possible classes that may be predicted and
may range from tens of thousands to hundreds of thousands of words.

about:blank 7/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Multiclass classification tasks are frequently modeled using a model that forecasts a
Multinoulli probability distribution for each example.

An event that has a categorical outcome, such as K in 1, 2, 3,..., K, is covered by the

Multinoulli distribution, which is a discrete probability distribution. In terms of
classification, this implies that the model forecasts the likelihood that a given example
will belong to a certain class label.

For multi-class classification, many binary classification techniques are applicable.

The following well-known algorithms can be used for multi-class classification:

 Progressive Boosting
 Choice trees
 Nearest K Neighbors
 Rough Forest
 Simple Bayes

Multi-class problems can be solved using algorithms created for binary classification.

In order to do this, a method is known as "one-vs-rest" or "one model for each pair of
classes" is used, which includes fitting multiple binary classification models with each
class versus all other classes (called one-vs-one).

 One-vs-One: For each pair of classes, fit a single binary classification model.

The following binary classification algorithms can apply these multi-class

classification techniques:

 One-vs-Rest: Fit a single binary classification model for each class versus all
other classes.

The following binary classification algorithms can apply these multi-class

classification techniques:

 Support vector Machine

 Logistic Regression

Multi-Label Classification

Multi-label classification problems are those that feature two or more class labels and
allow for the prediction of one or more class labels for each example.

Think about the photo classification example. Here a model can predict the existence
of many known things in a photo, such as “person”, “apple”, "bicycle," etc. A
particular photo may have multiple objects in the scene.

This greatly contrasts with multi-class classification and binary classification, which
anticipate a single class label for each occurrence.

about:blank 8/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Multi-label classification problems are frequently modeled using a model that

forecasts many outcomes, with each outcome being forecast as a Bernoulli probability
distribution. In essence, this approach predicts several binary classifications for each
example.

It is not possible to directly apply multi-label classification methods used for

multi-class or binary classification. The so-called multi-label versions of the
algorithms, which are specialized versions of the conventional classification
algorithms, include:

 Multi-label Gradient Boosting

 Multi-label Random Forests
 Multi-label Decision Trees

Another strategy is to forecast the class labels using a different classification

algorithm.

Imbalanced Classification

The term "imbalanced classification" describes classification jobs where the

distribution of examples within each class is not equal.

A majority of the training dataset's instances belong to the normal class, while a
minority belong to the abnormal class, making imbalanced classification tasks binary
classification tasks in general.

Examples comprise -

 Clinical diagnostic procedures

 Detection of outliers
 Fraud investigation

Although they could need unique methods, these issues are modeled as binary
classification jobs.

By oversampling the minority class or undersampling the majority class, specialized

strategies can be employed to alter the sample composition in the training dataset.

Examples comprise -

 SMOTE Oversampling
 Random Undersampling

It is possible to utilize specialized modeling techniques, like the cost-sensitive

machine learning algorithms, that give the minority class more consideration when
fitting the model to the training dataset.

Examples comprise:

 Cost-sensitive Support Vector Machines

about:blank 9/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

 Cost-sensitive Decision Trees

 Cost-sensitive Logistic Regression

Since reporting the classification accuracy may be deceptive, alternate performance

indicators may be necessary.

Examples comprise -

 F-Measure
 Recall
 Precision

Types of Classification Algorithms

You can apply many different classification methods based on the dataset you are
working with. It is so because the study of classification in statistics is extensive. The
top five machine learning algorithms are listed below.

1. Logistic Regression

It is a supervised learning classification technique that forecasts the likelihood of a

target variable. There will only be a choice between two classes. Data can be coded as
either one or yes, representing success, or as 0 or no, representing failure. The
dependent variable can be predicted most effectively using logistic regression. When
the forecast is categorical, such as true or false, yes or no, or a 0 or 1, you can use it.
A logistic regression technique can be used to determine whether or not an email is a
spam.

2. Naive Bayes

Naive Bayes determines whether a data point falls into a particular category. It can be
used to classify phrases or words in text analysis as either falling within a
predetermined classification or not.

Text Tag
“A great game” Sports
“The election is over” Not Sports
“What a great score” Sports
“A clean and unforgettable game” Sports
“The spelling bee winner was a surprise” Not Sports

3. K-Nearest Neighbors

It calculates the likelihood that a data point will join the groups based on which group
the data points closest to it are a part of. When using k-NN for classification, you
determine how to classify the data according to its nearest neighbor.

about:blank 10/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

4. Decision Tree

A decision tree is an example of supervised learning. Although it can solve regression

and classification problems, it excels in classification problems. Similar to a flow
chart, it divides data points into two similar groups at a time, starting with the "tree
trunk" and moving through the "branches" and "leaves" until the categories are more
closely related to one another.

5. Random Forest Algorithm

The random forest algorithm is an extension of the Decision Tree algorithm where
you first create a number of decision trees using training data and then fit your new
data into one of the created ‘tree’ as a ‘random forest’. It averages the data to connect
it to the nearest tree data based on the data scale. These models are great for
improving the decision tree’s problem of forcing data points unnecessarily within a
category.

6. Support Vector Machine

Support Vector Machine is a popular supervised machine learning technique for

classification and regression problems. It goes beyond X/Y prediction by using
algorithms to classify and train the data according to polarity.

Types of ML Classification Algorithms

1. Supervised Learning Approach

The supervised learning approach explicitly trains algorithms under close human
supervision. Both the input and the output data are first provided to the algorithm. The
algorithm then develops rules that map the input to the output. The training procedure
is repeated as soon as the highest level of performance is attained.

The two types of supervised learning approaches are:

 Regression
 Classification

2. Unsupervised Learning

This approach is applied to examine data's inherent structure and derive insightful
information from it. This technique looks for insights that can produce better results
by looking for patterns and insights in unlabeled data.

There are two types of unsupervised learning:

 Clustering
 Dimensionality reduction

about:blank 11/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

3. Semi-supervised Learning

Semi-supervised learning lies on the spectrum between unsupervised and supervised

learning. It combines the most significant aspects of both worlds to provide a unique
set of algorithms.

4. Reinforcement Learning

The goal of reinforcement learning is to create autonomous, self-improving

algorithms. The algorithm's goal is to improve itself through a continual cycle of trials
and errors based on the interactions and combinations between the incoming and
labeled data.

Classification Models

 Naive Bayes: Naive Bayes is a classification algorithm that assumes that

predictors in a dataset are independent. This means that it assumes the features
are unrelated to each other. For example, if given a banana, the classifier will
see that the fruit is of yellow color, oblong-shaped and long and tapered. All of
these features will contribute independently to the probability of it being a
banana and are not dependent on each other.

 Decision Trees: A Decision Tree is an algorithm that is used to visually

represent decision-making. A Decision Tree can be made by asking a yes/no
question and splitting the answer to lead to another decision. The question is at
the node and it places the resulting decisions below at the leaves. The tree
depicted below is used to decide if we can play tennis.

Figure 4: Decision Tree

In the above figure, depending on the weather conditions and the humidity and wind,
we can systematically decide if we should play tennis or not. In decision trees, all the
False statements lie on the left of the tree and the True statements branch off to the
right. Knowing this, we can make a tree which has the features at the nodes and the
resulting classes at the leaves.

 K-Nearest Neighbors: K-Nearest Neighbor is a classification and prediction

algorithm that is used to divide data into classes based on the distance between

about:blank 12/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

the data points. K-Nearest Neighbor assumes that data points which are close
to one another must be similar and hence, the data point to be classified will be
grouped with the closest cluster.

Figure 5: Data to be classified

Figure 6: Classification using K-Nearest

Neighbours

Evaluating a Classification Model

After our model is finished, we must assess its performance to determine whether it is
a regression or classification model. So, we have the following options for assessing a
classification model:

1. Confusion Matrix

 The confusion matrix describes the model performance and gives us a matrix
or table as an output.
 The error matrix is another name for it.
 The matrix is made up of the results of the forecasts in a condensed manner,
together with the total number of right and wrong guesses.

The matrix appears in the following table:

Actual Positive Actual Negative

Predicted Positive True Positive False Positive

about:blank 13/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Predicted Negative False Negative True Negative

Accuracy = (TP+TN)/Total Population

2. Log Loss or Cross-Entropy Loss

 It is used to assess a classifier's performance, and the output is a probability

value between 1 and 0.
 A successful binary classification model should have a log loss value that is
close to 0.
 If the anticipated value differs from the actual value, the value of log loss
rises.
 The lower log loss shows the model’s higher accuracy.

Cross-entropy for binary classification can be calculated as:

(ylog(p)+(1?y)log(1?p))

Where p = Predicted Output, y = Actual output.

3. AUC-ROC Curve

 AUC is for Area Under the Curve, and ROC refers to Receiver Operating
Characteristics Curve.
 It is a graph that displays the classification model's performance at various
thresholds.
 The AUC-ROC Curve is used to show how well the multi-class classification
model performs.
 The TPR and FPR are used to draw the ROC curve, with the True Positive
Rate (TPR) on the Y-axis and the FPR (False Positive Rate) on the X-axis.

Use Cases Of Classification Algorithms

There are many applications for classification algorithms. Here are a few of them

 Speech Recognition
 Detecting Spam Emails
 Categorization of Drugs
 Cancer Tumor Cell Identification
 Biometric Authentication, etc.

Representation

A machine learning model can't directly see, hear, or sense input examples. Instead,
you must create a representation of the data to provide the model with a useful
vantage point into the data's key qualities. That is, in order to train a model, you must
choose the set of features that best represent the data.

The choice of representation has an enormous effect on the performance of machine

learning algorithms.

about:blank 14/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

In the context of neural networks, Chollet says that layers extract representations.
The core building block of neural networks is the layer, a data-processing module that
you can think of as a filter for data. Some data goes in, and it comes out in a more
useful form. Specifically, layers extract representations out of the data fed into
them--hopefully, representations that are more meaningful for the problem at hand.
Most of deep learning consists of chaining together simple layers that will implement
a form of progressive data distillation. A deep-learning model is like a sieve for data
processing, made of a succession of increasingly refined data filters--the layers.

That makes me think that representations are the form that the training/test data takes
as it is progressively transformed. e.g. words could initially be represented as dense or
sparse (one-hot encoded) vectors. And then their representation changes one or more
times as they are fed into a model.

Mitchell says that we need to choose a representation for the target function.

Now that we have specified the ideal target function V , we must

choose a representation that the learning program will use to describe
the function V^ that it will learn.

This makes me think that the 'representation' could be described as the architecture of
the model, or maybe a mathematical description of the model. With this definition, we
don't know the true representation (equation) of the target function (if we did we
would have nothing to learn). So it is our task to decide what equation we want to use
to best approximate the target function.

Cost function

Machine Learning models require a high level of accuracy to work in the actual world.
But how do you calculate how wrong or right your model is? This is where the cost
function comes into the picture. A machine learning parameter that is used for
correctly judging the model, cost functions are important to understand to know how
well the model has estimated the relationship between your input and output
parameters.

What Is Cost Function in Machine Learning?

After training your model, you need to see how well your model is performing. While
accuracy functions tell you how well the model is performing, they do not provide
you with an insight on how to better improve them. Hence, you need a correctional
function that can help you compute when the model is the most accurate, as you need
to hit that small spot between an undertrained model and an overtrained model.

A Cost Function is used to measure just how wrong the model is in finding a relation
between the input and output. It tells you how badly your model is
behaving/predicting

Consider a robot trained to stack boxes in a factory. The robot might have to consider
certain changeable parameters, called Variables, which influence how it performs.

about:blank 15/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

Let’s say the robot comes across an obstacle, like a rock. The robot might bump into
the rock and realize that it is not the correct action.

It will learn from this, and next time it will learn to avoid rocks. Hence, your machine
uses variables to better fit the data. The outcome of all these obstacles will further
optimize the robot and help it perform better. It will generalize and learn to avoid
obstacles in general, say like a fire that might have broken out. The outcome acts as a
cost function, which helps you optimize the variable, to get the best variables and fit
for the model.

Figure 1: Robot learning to avoid obstacles

What Is Gradient Descent?

Gradient Descent is an algorithm that is used to optimize the cost function or the error
of the model. It is used to find the minimum value of error possible in your model.

Gradient Descent can be thought of as the direction you have to take to reach the least
possible error. The error in your model can be different at different points, and you
have to find the quickest way to minimize it, to prevent resource wastage.

Gradient Descent can be visualized as a ball rolling down a hill. Here, the ball will
roll to the lowest point on the hill. It can take this point as the point where the error is
least as for any model, the error will be minimum at one point and will increase again
after that.

In gradient descent, you find the error in your model for different values of input
variables. This is repeated, and soon you see that the error values keep getting smaller
and smaller. Soon you’ll arrive at the values for variables when the error is the least,
and the cost function is optimized.

Figure 2: Gradient Descent

about:blank 16/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

What Is the Cost Function For Linear Regression?

A Linear Regression model uses a straight line to fit the model. This is done using the
equation for a straight line as shown :

Figure 3: Linear regression function

In the equation, you can see that two entities can have changeable values (variable) a,
which is the point at which the line intercepts the x-axis, and b, which is how steep
the line will be, or slope.

At first, if the variables are not properly optimized, you get a line that might not
properly fit the model. As you optimize the values of the model, for some variables,
you will get the perfect fit. The perfect fit will be a straight line running through most
of the data points while ignoring the noise and outliers. A properly fit Linear
Regression model looks as shown below :

Figure 4: Linear regression graph

For the Linear regression model, the cost function will be the minimum of the Root
Mean Squared Error of the model, obtained by subtracting the predicted values from
actual values. The cost function will be the minimum of these error values.

Figure 5: Linear regression cost function

By the definition of gradient descent, you have to find the direction in which the error
decreases constantly. This can be done by finding the difference between errors. The
small difference between errors can be obtained by differentiating the cost function
and subtracting it from the previous gradient descent to move down the slope.

about:blank 17/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 18/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 19/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 20/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 21/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 22/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 23/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 24/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 25/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 26/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 27/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 28/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 29/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 30/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 31/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 32/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 33/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 34/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 35/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 36/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 37/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 38/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 39/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 40/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 41/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 42/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 43/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 44/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 45/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 46/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 47/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 48/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 49/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 50/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 51/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 52/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 53/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 54/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 55/56
12/18/23, 6:41 PM MC4301 - ML Unit 4 (Parametric Machine Learning)

about:blank 56/56

r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
ML All Units Mca 3rd Semester Anna University
No ratings yet
ML All Units Mca 3rd Semester Anna University
100 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Unit II Ui Design
No ratings yet
Unit II Ui Design
28 pages
Notes On Microfinance
100% (5)
Notes On Microfinance
8 pages
Mc4102 Oose Unit 1 KVL Notes
No ratings yet
Mc4102 Oose Unit 1 KVL Notes
26 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
44 pages
Unit V Easy To Learn
No ratings yet
Unit V Easy To Learn
21 pages
The Bio-Ecological Systems Theory
No ratings yet
The Bio-Ecological Systems Theory
14 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
7 pages
MC4301 - ML Unit 1 (Introduction)
No ratings yet
MC4301 - ML Unit 1 (Introduction)
47 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
29 pages
Lec-1 ML Intro
No ratings yet
Lec-1 ML Intro
15 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
PM&QS Capability Statement - GSK
No ratings yet
PM&QS Capability Statement - GSK
58 pages
Instant Ebooks Textbook The Rust Programming Language 2nd Edition Steve Klabnik Download All Chapters
100% (4)
Instant Ebooks Textbook The Rust Programming Language 2nd Edition Steve Klabnik Download All Chapters
49 pages
Accessories With Name v98
100% (1)
Accessories With Name v98
1 page
Machine Learning
No ratings yet
Machine Learning
13 pages
UNIT1
No ratings yet
UNIT1
38 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Machine Learning Algorithms
100% (1)
Machine Learning Algorithms
15 pages
Beyond Binary Classification
No ratings yet
Beyond Binary Classification
34 pages
Astm D751-19
No ratings yet
Astm D751-19
3 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Soft Computing UNIT 3
No ratings yet
Soft Computing UNIT 3
10 pages
Creating Multiplayer & Eating Up Market Share With Figma
No ratings yet
Creating Multiplayer & Eating Up Market Share With Figma
41 pages
M.Tech (CSE) Big Data Analytics Curriculum
No ratings yet
M.Tech (CSE) Big Data Analytics Curriculum
69 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
Machine Learning CS-8 Dept. of CS, KFUEIT: Instructor: Muhammad Adeel Abid
No ratings yet
Machine Learning CS-8 Dept. of CS, KFUEIT: Instructor: Muhammad Adeel Abid
19 pages
Revised CS8383 (Eee) Oop Lab Man
No ratings yet
Revised CS8383 (Eee) Oop Lab Man
85 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
ML QB With Answer
No ratings yet
ML QB With Answer
20 pages
Himss20 Enovacom Whitepaper 1277
No ratings yet
Himss20 Enovacom Whitepaper 1277
24 pages
Unit V
100% (1)
Unit V
24 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Unsupervised Learning Networks: "Principles of Soft Computing, 2
No ratings yet
Unsupervised Learning Networks: "Principles of Soft Computing, 2
44 pages
Short Manual: Intellect v.4.7.6 SP2 Module Unipos
No ratings yet
Short Manual: Intellect v.4.7.6 SP2 Module Unipos
32 pages
Magnet Generator For Wind Power PDF
100% (2)
Magnet Generator For Wind Power PDF
77 pages
Manufacturring List
No ratings yet
Manufacturring List
239 pages
Modeling Market Valuation of Offshore Drilling Contractors
No ratings yet
Modeling Market Valuation of Offshore Drilling Contractors
32 pages
Dapsone
100% (1)
Dapsone
33 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
Kathwada Industry List
No ratings yet
Kathwada Industry List
14 pages
SRM Valliammai Engineering College (An Autonomous Institution)
No ratings yet
SRM Valliammai Engineering College (An Autonomous Institution)
9 pages
Cover Letter
No ratings yet
Cover Letter
2 pages
Inverters: Variable Frequency Drives Family
No ratings yet
Inverters: Variable Frequency Drives Family
52 pages
Economy Bookings
No ratings yet
Economy Bookings
8 pages
Nikhil Jain V Fair & Handsome (Emami) - Why Did A Consumer Forum in Delhi Rule That Fair & Handsome' Ads Are Misleading - Factly
No ratings yet
Nikhil Jain V Fair & Handsome (Emami) - Why Did A Consumer Forum in Delhi Rule That Fair & Handsome' Ads Are Misleading - Factly
5 pages
RM &ipr Notes
No ratings yet
RM &ipr Notes
57 pages
2010 - Baja Sae Catalog
100% (1)
2010 - Baja Sae Catalog
15 pages
Cruiser SE
No ratings yet
Cruiser SE
2 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Work Questions For Practice
No ratings yet
Work Questions For Practice
2 pages
06 Unit V Patents (23 Pages)
No ratings yet
06 Unit V Patents (23 Pages)
23 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
EIGRP Interview Questions
No ratings yet
EIGRP Interview Questions
6 pages
Drive Types
No ratings yet
Drive Types
15 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Isom 3400 - Python For Business Analytics 1. Intro To Python
No ratings yet
Isom 3400 - Python For Business Analytics 1. Intro To Python
46 pages
Red Sea License 2021-2025
No ratings yet
Red Sea License 2021-2025
1 page
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Machine Learning With Python Unit 1-17-84 Final13092024
No ratings yet
Machine Learning With Python Unit 1-17-84 Final13092024
68 pages
BI 09 Optimiz
No ratings yet
BI 09 Optimiz
52 pages
Lesson 1 Introduction To The Cabin Crew
No ratings yet
Lesson 1 Introduction To The Cabin Crew
6 pages
Mach3 OEM Code
100% (1)
Mach3 OEM Code
27 pages
Machine Learning For Marketers PowerPoint Presentation Storyboard
No ratings yet
Machine Learning For Marketers PowerPoint Presentation Storyboard
25 pages
Unit III
No ratings yet
Unit III
58 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
Internship Report On Credit Adminitraation of Mutual Trust Bank
No ratings yet
Internship Report On Credit Adminitraation of Mutual Trust Bank
140 pages
Unit 5
No ratings yet
Unit 5
27 pages
Data Science Chapitre 0
No ratings yet
Data Science Chapitre 0
25 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Unit 4
No ratings yet
Unit 4
24 pages
Honours in Artificial Intelligence and Machine Learning: Board of Studies (Computer Engineering)
No ratings yet
Honours in Artificial Intelligence and Machine Learning: Board of Studies (Computer Engineering)
16 pages
ML Lesson Plan (21AI63)
No ratings yet
ML Lesson Plan (21AI63)
8 pages
Guidelines For Project Work in Economics
No ratings yet
Guidelines For Project Work in Economics
5 pages
18AI61
No ratings yet
18AI61
3 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.