Introduction to ML Unit-1 PPT
Introduction to ML Unit-1 PPT
Artificial Intelligence:
Algorithms and systems that exhibit human-like intelligence.
Machine Learning:
Subset of AI that can learn to perform a task with extracted data and/or models.
Deep Learning:
Subset of machine learning that imitate the functioning of human brain to
solve problems.
Definition of machine learning
First what mean Learning?
learning is the ability to improve one’s behavior with experience.
Machine learning explores algorithms that learn from data, build models
from data and this model can be used for different tasks.
=For example, model can be used for prediction, decision making or solving
tasks.
Machine learning is about extracting knowledge from data.
Definition of machine learning
Machine learning can also be defined as the process of solving a practical problem
by
1) gathering a dataset, and
2) algorithmically building a statistical model based on that dataset.
That statistical model is assumed to be used somehow to solve the practical
problem.
ML Vs Classical/Traditional Algorithms
Traditional Programming
Traditional programming is a manual process meaning a programmer creates the program.
i.e. input + program = output
Machine Learning
Unlike traditional programming, machine learning is an automated process.
ML algorithm automatically formulates the rules from the data.
i.e. Input (Features) + Output (Class Label) = Program (Rules)
The standard deviation is the square root of the variance and is a measure of
uncertainty or volatility.
Cont.……..
Cont.……..
Types of Machine Learning
Supervised (inductive) learning
– Training data includes desired / correct outputs or labels
Unsupervised learning
– Training data does not include desired outputs
Semi-supervised learning (various forms)
– Training data includes a few desired outputs
– Training data has desired outputs, but for a different (related) task
Reinforcement learning
– Rewards from sequence of actions
Types of Machine Learning
Supervised Vs Unsupervised Vs Reinforcement
Supervised Vs Unsupervised Vs Reinforcement
Data Types
Training
Aim
Approach
Output Feedback
Popular Algorithms
Applications
Recap
Supervised Learning
The machines learns from the training data that labeled.
Supervised learning is the method in which we teach the machine by using labeled data.
In the supervised learning, a model is able to predict with help of labeled dataset.
Supervised learning often require human effort to build the training set, but afterward
automates & often speed up an otherwise laborious or infeasible task.
Supervised learning is used whenever we want to predict a certain outcome from a given input,
and we have examples of input/output pairs.
Supervised machine learning algorithms can apply what has been learned in the past to new
data using labeled examples to predict future events.
If we have too many features, the learned hypothesis may fit the training set
very well but fail to generalize to new examples.
Sources of noise and error
• While learning a target function using a training set
• Two sources of noise
– Some training points may not come exactly from the target function:
stochastic noise
– The target function may be too complex to capture using the chosen
hypothesis set: deterministic noise
• Generalization error: Model tries to fit the noise in the training data, which
gets extrapolated to the test set
Ways to handle noise
• Validation
– Check performance on data other than training data, and tune model accordingly
• Regularization
– Constraint the model so that the noise cannot be learnt too well
Validation
• Divide given data into train set and test set
– E.g., 80% train and 20% test
– Better to select randomly
• Learn parameters using training set
• Check performance (validate the model) on test set, using measures such as
accuracy, misclassification rate, etc.
• Trade-off: more data for training vs. validation
Popular methods of evaluating a classifier
• Holdout method
– Split data into train and test set (usually 2/3 for train and 1/3 for test). Learn model using
train set and measure performance over test set
– Usually used when there is sufficiently large data, since both train and test data will be a
part
• Repeated Holdout method
– Repeat the Holdout method multiple times with different subsets used for train/test
– In each iteration, a certain portion of data is randomly selected for training, rest for testing
– The error rates on the different iterations are averaged to yield an overall error rate
– More reliable than simple Holdout
Popular methods of evaluating a classifier
• k-fold cross-validation
– First step: data is split into k subsets of equal size;
– Second step: each subset in turn is used for testing and the remainder for
training
– Performance measures averaged over all folds
• Popular choice for k: 10 or 5
• Advantage: all available data points being used to train as well test
model
k-Nearest Neighbors
K-nearest neighbors is a machine learning algorithm used for classification and regression tasks.
The k-NN algorithm is arguably the simplest machine learning algorithm. Building the model
consists only of storing the training dataset.
To make a prediction for a new data point, the algorithm finds the closest data points in the
training dataset—its “nearest neighbors.
In the K-NN algorithm, the "K" refers to the number of nearest neighbors that are considered
when making a prediction or classification for a new data point.
The algorithm identifies the K closest data points in the training set based on a distance metric
(such as Euclidean distance) and assigns the most common class label (for classification) or
calculates the average value (for regression) of those K neighbors to make a prediction
K-nearest neighbors is a relatively simple and interpretable algorithm, but it can
be computationally expensive, especially for large datasets, as it requires
comparing the new observation to all training examples.
k-Nearest Neighbors
KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure.
Our KNN model will find the similar features of the new data set to the cats and dogs
images and based on the most similar features it will put it in either cat or dog category.
Basic k-nearest neighbor classification
K-nearest Neighbor
K-nearest Neighbor
K-nearest Neighbor
Naive Bayes (NB)
NB models are efficient.
The reason is that they learn parameters by looking at each feature individually
and collect simple per-class statistics from each feature.
The NB classifier is a classical demonstration of how generative assumptions
and parameter estimations simplify the learning process.
Consider the problem of predicting a label y ∈ {0,1} on the basis of a vector of
features x = (x1,...,xd), where we assume that each xi is in {0,1}.
Recall that the Bayes optimal classifier is
hBayes(x) = argmax P[Y = y|X = x]. y∈{0,1}
Logistic Regression
First what mean regression?
Regression analysis is a predictive modeling technique
It estimates the relationship between a dependent (target) and an independent
variable (predictor).
Logistic Regression
Logistic regression produces results in a binary format which is used to
predict the outcome of a categorical dependent variable.
So the outcome should be discrete/categorical such as:
Logistic Regression
• Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique.
• It is used for predicting the categorical dependent variable using a given set of
independent variables.
• Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
• Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
Logistic Regression
The below image is showing the logistic function:
Logistic regression uses the concept of predictive modeling as regression;
therefore, it is called logistic regression, but is used to classify samples;
Therefore, it falls under the classification algorithm.
Logistic Regression Equation
• The Logistic regression equation can be obtained from the Linear Regression
equation. The mathematical steps to get Logistic Regression equations are given
below:
• We know the equation of the straight line can be written as:
• In Logistic Regression y can be between 0 and 1 only, so for this let's divide the
above equation by (1-y):
• But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
Linear VS Logistic Regression
Support Vector Machine Algorithm
SVM is one of the most popular Supervised Learning algorithms, which is
used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in ML.
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine.
Support Vector Machine Algorithm
Consider the below diagram in which there are two different categories that
are classified using a decision boundary or hyperplane:
Support Vector Machine Algorithm
• Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange creature. So as support
vector creates a decision boundary between these two data (cat and dog) and choose extreme
cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane:
• There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
How does SVM works?
• Linear SVM:
• The working of the SVM algorithm can be understood by using an example.
Suppose we have a dataset that has two tags (green and blue), and the dataset has
two features x1 and x2. We want a classifier that can classify the pair(x1, x2) of
coordinates in either green or blue. Consider the below image:
How does SVM works?
• Non-Linear SVM:
• If data is linearly arranged, then we can separate it by using a straight line, but for
non-linear data, we cannot draw a single straight line. Consider the below image:
Decision Tree algorithm
• It is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems.
• Graphical representation of all possible solutions to a decision
• It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents the
outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• Decision are based on some conditions
• decision made can be easily explained
Decision Tree algorithm
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
Decision Tree algorithm
• Below diagram explains the general structure of a decision tree:
Decision Tree
A decision tree is graphical representation of all the possible solutions to
decision based on certain conditions.
Why Use Decision Tree
• There are various algorithms in Machine learning, so choosing the best
algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model.
• Below are the two reasons for using the Decision tree:
• Decision Trees usually mimic human thinking ability while making a decision,
so it is easy to understand.
• The logic behind the decision tree can be easily understood because it shows a
tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the
tree.
• Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
• In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree.
• This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to
the next node.
• For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further.
• It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:
How does the Decision Tree algorithm Work? step
The complete process can be better understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
How does the Decision Tree algorithm Work?
• Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not.
• So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM).
• The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.
• The next decision node further gets split into one decision node (Cab facility)
and one leaf node.
• Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer).
How does the Decision Tree algorithm Work?
• Consider the below diagram:
Random Forest Algorithm
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of
combining multiple classifiers to solve a complex problem and to improve
the performance of the model.
• As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and takes
the average to improve the predictive accuracy of that dataset."
• Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
Random Forest Algorithm
• The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
• The below diagram explains the working of the Random Forest algorithm:
Why Use Random Forest Algorithm?
• Below are some points that explain why we should use the Random Forest
algorithm:
• It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a large proportion of data is missing.
How Does Random Forest Algorithm Work
• Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree
created in the first phase.
• The Working process can be explained in the below steps and diagram:
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected data points (Subsets).
• Step-3: Choose the number N for decision trees that you want to build.
• Step-5: For new data points, find the predictions of each decision tree, and assign the
new data points to the category that wins the majority votes.
How Does Random Forest Algorithm Work
• Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point occurs,
then based on the majority of results, the Random Forest classifier predicts the
final decision. Consider the below image:
The End of the chapter
Thanks for your Attention!!!
Question ?
Query