0% found this document useful (0 votes)
2 views26 pages

ML UNIT 3 QA (2)

The document explains the Decision Tree algorithm, detailing its structure, advantages, and disadvantages, as well as the ID3 algorithm for generating decision trees. It also covers the CART (Classification and Regression Trees) algorithm, including its methodology, splitting criteria, and evaluation metrics. Additionally, it provides examples of how these algorithms can be applied in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views26 pages

ML UNIT 3 QA (2)

The document explains the Decision Tree algorithm, detailing its structure, advantages, and disadvantages, as well as the ID3 algorithm for generating decision trees. It also covers the CART (Classification and Regression Trees) algorithm, including its methodology, splitting criteria, and evaluation metrics. Additionally, it provides examples of how these algorithms can be applied in classification tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

gIII B.

Tech I Sem R22 CSE(AI & ML) and


IT

UNIT – III (QA)

1. Explain the Decision Tree algorithm. What are its advantages and disadvantages

Decision Tree : A decision tree is a type of supervised learning algorithm that is commonly used
in machine learning to model and predict outcomes based on input data.
 It is a tree-like structure where each internal node tests on attribute, each branch
corresponds to attribute value and each leaf node represents the final decision or
prediction.
 The decision tree algorithm is used to solve both regression and classification
problems.
 A decision tree in machine learning is a versatile, interpretable algorithm used for
predictive modelling.
 It structures decisions based on input data, making it suitable for both classification and
regression tasks.

Decision Tree Example

 Root Node: Represents the original choice or feature from which the tree branches, is
the highest node.
 Internal Nodes (Decision Nodes): Nodes in the tree whose choices are determined by
the values of particular attributes.

1 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


 Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are
decided upon. There are no more branches on leaf nodes.
 Branches (Edges): Links between nodes that show how decisions are made in response
to particular circumstances.
 Splitting: The process of dividing a node into two or more sub-nodes based on a
decision criterion.
 Parent Node: A node that is split into child nodes.
 Child Node: Nodes created as a result of a split from a parent node.
 Pruning: The process of removing branches or nodes from a decision tree to improve its
generalisation and prevent overfitting.

Constructing Decision Trees Algoorithm :


Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached upto cannot further
classify the nodes and called the final node as a leaf node.

Attribute Selection Measures : A technique which is used t select the best attribute for the
root node and for sub-nodes is called as Attribute selection measure or ASM.
There are 3 techniques for ASM

(i) Entropy : Measures the amount of uncertainty or impurity in the dataset.


If p is the probability of an instance being classified into a particular class then
𝑛

𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − ∑ 𝑝𝑖 log2 𝑝𝑖
𝑖=1
(ii) Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if it
was randomly classified according to the distribution of classes in the dataset.
If p is the probability of an instance being classified into a particular class then
𝐶

𝐺𝑖𝑛𝑖 (𝑆) = 1 − ∑ 𝑝𝑖 2
𝑖=1

(iii) Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is
split on an attribute.

Suppose S is a set of instances, A is an attribute and 𝑺𝒇𝒊 is the subset of S for which
attribute A has value i, and the entropy of partitioning the data is calculated by weighing the
entropy of each partition by its size relative to the original set.
𝒏
|𝑺𝒇𝒊 |
𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏(𝑺, 𝑨) = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺) − ∑ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒇𝒊 )
|𝑺|
𝒊=𝟏
2 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)
Advantages of Decision Tree :
1. Easy to understand and interpret, making them accessible to non-experts.
2. Handle both numerical and categorical data without requiring extensive preprocessing.
3. Provides insights into feature importance for decision-making.
4. Handle missing values and outliers without significant impact.
5. Applicable to both classification and regression tasks.

Disadvantages of Decision Tree :


1. It tends to overfit the data, especially if the tree is allowed to grow too deep.
2. Sensitivity to small changes in data, limited generalization if training data is not
representative
3. Potential bias in the presence of imbalanced data.

2. Write ID3 algorithm

Decision Tree : A decision tree is a type of supervised learning algorithm that is commonly used
in machine learning used for predictive modelling.

ID3 algorithm : In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm


invented by Ross Quinlan used to generate a decision tree from a dataset.

 Build a decision tree by iteratively selecting the best attribute to split the data based on
information gain.

Entropy : Measures the amount of uncertainty or impurity in the dataset.


If pi is the probability of an instance being classified into a particular class then
𝑛

𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − ∑ 𝑝𝑖 log2 𝑝𝑖
𝑖=1

Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split
on an attribute.
Suppose S is a set of instances, A is an attribute and 𝑺𝒇𝒊 is the subset of S for which
attribute A has value i, and the entropy of partitioning the data is calculated by weighing the
entropy of each partition by its size relative to the original set.
𝒏
|𝑺𝒇𝒊 |
𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏(𝑺, 𝑨) = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺) − ∑ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒇𝒊 )
|𝑺|
𝒊=𝟏

Steps involved in ID3 algorithm :


1. Determine entropy for the overall the dataset using class distribution.

3 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


2. For each feature.
 Calculate Entropy for Categorical Values.
 Assess information gain for each unique categorical value of the feature.
3. Choose the feature that generates highest information gain.
4. Iteratively apply all above steps to build the decision tree structure.

ID3 algorithm :
If all examples have the same label:
– return a leaf with that label
Else if there are no features left to test:
– return a leaf with the most common label
• Else: – choose the feature F that maximises the information gain of S to be the next
node using Equation
𝒏
|𝑺𝒇𝒊 |
𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏(𝑺, 𝑨) = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺) − ∑ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒇𝒊 )
|𝑺|
𝒊=𝟏
add a branch from the node for each possible value f in Fˆ – for each branch:
∗ calculate 𝑺𝒇𝒊 by removing F from the set of features
∗ recursively call the algorithm with 𝑺𝒇𝒊 to compute the gain relative to the current
set of example

Flow Chart :

4 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


3. Explain Classification and Regression Trees (CART) algorithm with an example

The CART (Classification and Regression Trees) Algorithm : CART is a type of supervised
learning algorithm and is a decision tree methodology used for predictive modeling tasks. It can
perform both classification (predicting discrete labels) and regression (predicting continuous
values).
CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone
in 1984.
Tree Structure
Root Node: The top node of the tree, representing the entire dataset.
Internal Nodes: Represent decision points. Each internal node splits the data into two or more
subsets.
Leaf Nodes (Terminal Nodes): Represent the outcome or prediction. For classification trees,
each leaf node represents a class label.
For regression trees, each leaf node represents a continuous value.

CART Algorithm :

1. Splitting Criteria : CART uses a greedy approach to split the data at each node. It evaluates
all possible splits and selects the one that best reduces the impurity of the resulting subsets.

5 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


CART builds a binary decision tree by repeatedly splitting the data on feature values that
maximize separation.

To determine the optimal split CART uses

(i) Gini Index (for Classification) : Measures the likelihood of an incorrect classification of a
new instance if it was randomly classified according to the distribution of classes in the dataset.
If p is the probability of an instance being classified into a particular class then
𝐶

𝐺𝑖𝑛𝑖 (𝑆) = 1 − ∑ 𝑝𝑖 2
𝑖=1

The lower the Gini impurity, the more pure the subset is. For regression tasks

or (ii) Mean Squared Error (for regression) residual reduction. The lower the residual
reduction, the better the fit of the model to the data.
𝑛
1 2
⏞𝑖 )
𝑀𝑆𝐸 = ∑ (𝑌𝑖 − 𝑌
𝑛
𝑖=1

Where 𝑌𝑖 is the actual value of the 𝑖𝑡ℎ observation .and ⏞𝑖 is the predicted value
𝑌
of the 𝑖𝑡ℎ observation.

2. Recursive Partitioning: After the initial split, each subset of data is recursively split again,
creating branches until reaching terminal nodes or meeting stopping criteria (e.g., maximum
depth or minimum samples per leaf).

3. Leaf Nodes and Prediction: Once no further splits are required, the leaves of the tree
represent the final predictions. For classification, the leaves represent class labels, while for
regression, they contain mean or median values of the target variable.

4. Pruning Overfitting: The process of removing sections of the tree to reduce complexity and
improve generalization. To prevent overfitting of the data, pruning is a technique used to
remove the nodes
Pruning techniques :
(i) Cost complexity pruning : Calculating the cost of each node and removing nodes that
have a negative cost.
(ii) Information gain pruning : Calculating the information gain of each node and removing
nodes that have a low information gain.

4. Evaluation Metrics
(i) Classification Accuracy: The proportion of correctly classified instances.
(ii) Confusion Matrix: A table used to evaluate the performance of a classification model.

6 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


True Positive (TP) False Positive (FP)
False Negative (FN) True Negative (TN)

𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
(iii) R-squared: Used for regression tasks to measure the proportion of variance explained by
the model.

Advantages
▶ It is a simple and intuitive algorithm that is easy to understand and interpret.
▶ It can handle both numerical and categorical data.
▶ It can handle missing values by imputing them with surrogate splits.
▶ It can handle multi-class classification problems by using an extension called the multi-
class CART

Disadvantages
▶ It tends to overfit the data, especially if the tree is allowed to grow too deep.
▶ It is a greedy algorithm that may not find the optimal tree.
▶ It may be biased towards predictors with many categories or high cardinality.
▶ It may produce unstable results if the data is sensitive to small changes

Example of CART in Classification : Consider binary classification problem ENJOY game YES or
NO hass 14 training examples in below table

7 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Step 1 : Splitting : Attribute : “Outlook” (sunny, cloudy, rainy).
Subsets: Overcast, Rainy, and Sunny.
Outlook Yes No No. of instances
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48
Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0
Gini(Outlook=Rain) = 1 – (3/5) 2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48
Then, calculate weighted sum of gini indexes for outlook feature.
 Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342
Si,ilarly calculate Gini of Temperature , Humidity and Wind

Decision :
Feature Gini index
Outlook 0. 342
Temperature 0.439
Humidity 0.367
Wind 0.428
calculated gini index values of outlook feature has cost is the lowest.
So put outlook decision at the top of the tree (Root Node). The sub dataset in the overcast
leaf has only yes decisions. This means that overcast leaf is over.

8 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Step 2 : Recursive Splitting: Find the gini index scores for temperature, humidity and
wind features respectively of the sub dataset for sunny outlook..
 Divide the sunny subset according to temperature

 Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0


 Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0
 Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25 – 0.25 = 0.5
 Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2

Step 3 : Construction of Decision Tree : Assign a leaf node to each subset that contains
instances that belong to the same class.

9 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Recursively repeat steps 1-3 for each subset until all instances in a given subset belong to the
same class or no further splitting is possible.

Final Decision Tree :

Regression Tree : A Regression Tree is a type of decision tree that is used for predicting
continuous response variables.
It is a simple and fast algorithm commonly used for predicting categorical, discrete, or
nonlinear sample data in computer science.
 Calculate standard deviation reduction values for all features. Select the best feature
which has the highest score

4. Explain Gaussian mixture models.


Gaussian Mixture Model :
 A Gaussian mixture model is a soft clustering technique used in unsupervised learning
to determine the probability that a given data point belongs to a cluster.
 It’s composed of several Gaussians, each identified by k ∈ {1,…, K}, where K is the
number of clusters in a data set.

.Clustering is an unsupervised learning problem to find clusters of points in our data set that
have common characteristics

10 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Gaussian Mixture Model : A Gaussian mixture is a function that is composed of several
Gaussians, each identified by k ∈ {1,…, K}, where K is the number of clusters of our data set.
Each Gaussian k in the mixture is comprised of the following parameters:
 A mean μ that defines its center.
 A covariance Σ that defines its width. This would be equivalent to the dimensions of an
ellipsoid in a multivariate scenario.
 A mixing probability π that defines how big or small the Gaussian function will be.

Let’s illustrate these parameters graphically

The Expectation-Maximization (EM) algorithm is used for parameter estimation in


Gaussian Mixture Models (GMMs).
GMMs are a probabilistic model that represents a mixture of multiple Gaussian
distributions, each with its own mean and covariance, and each representing a "cluster"
within the data.
The EM algorithm iteratively improves estimates of the parameters (means, covariances,
and weights) of each Gaussian component in the mixture.

The EM algorithm iteratively improves estimates of the parameters (means, covariances,


and weights) of each Gaussian component in the mixture.
Step 1: Initialization :
 Number of Components: Decide on the number of Gaussian components, K
 Initialize Parameters: Randomly initialize the parameters for each Gaussian component:
 Means μ𝐾
 Covariance matrices, Σk
 Mixing coefficients 𝛑𝑲

 Step 2: Expectation (E-step) In this step, we compute the posterior probabilities


(responsibilities) that each data point belongs to each component based on the current
parameter estimates.
 Step 3: Maximization (M-step) In this step, we update the parameters of each
component using the responsibilities computed in the E-step.
 Update the Means μ𝐾

11 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Update the Covariances Σk: Update the Mixing Coefficients : 𝛑𝑲

 Step 4: Convergence Repeat the E-step and M-step until convergence, typically
measured by a small change in the log-likelihood or in the parameter values between
iterations.

EM Algorithm :

12 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


5. Explain the types of Ensemble Learning
Ensemble Learning : Ensemble learning is a machine learning technique that combines the
predictions of multiple models to improve predictive performance. The idea is that by
combining models with different strengths and weaknesses, the ensemble can achieve better
results than any single model.
Ensemble learning is a machine learning technique that aggregates two or more learners (e.g.
regression models, neural networks) in order to produce better predictions.

Different ways to combine Classifiers. :


Methods for
1. Independently Constructing Ensembles
2. Coordinated Construction of Ensembles

Methods for Independently Constructing Ensembles


 Majority Vote
 Bagging and Random Forest
 Randomness Injection
 Feature-Selection Ensembles
 Error-Correcting Output Coding

Methods for Coordinated Construction of Ensembles :
 Boosting
 Stacking

1. Voting : Some classification systems will only produce an output where all the classifiers
agree, or more than half of them agree, whereas others simply take the most common output,
which is what we usually mean by majority voting

2. Bagging : Bagging is a supervised learning technique that can be used for both regression
and classification tasks. Here is an overview of the steps including Bagging classifier algorithm:
Bootstrap Sampling : Divides the original training data int ‘N’ subsets and randomly selects a
subset with replacement in some rows from other subsets. This step ensures that the base
models are trained on diverse subsets of the data and there is no class imbalance.

13 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Bagging is a supervised learning technique that can be used for both regression and
classification tasks.
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree.

3. The Basic Random Forest Training :


For each of N trees:
– create a new bootstrap sample of the training set – use this bootstrap sample to train a
decision tree
– at each node of the decision tree, randomly select m features, and compute the information
gain (or Gini impurity) only on that set of features, selecting the optimal one
– repeat until the tree is complete

4. Bosting : Sequentially builds a group of decision trees and corrects the residual errors made
by previous trees, enhancing its predictive accuracy.
It trains each new weak learner to fit the residuals of the previous ensemble’s predictions
thus making it less sensitive to individual data points or outliers in the data.

14 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Boosting is an ensemble technique that combines multiple weak learners to create a strong
learner

5. SStocing : It is a type of ensemble learning, also known as stacked generalization, is a


method that combines multiple different predictive models to improve the overall
performance.
This approach involves training multiple first-level models, or base learners, on the same data
and then using a second-level model, or meta-learner, to synthesize their predictions.

6. Describe different ways to combine Classifiers.


Ensemble Learning : Ensemble learning combines the predictions of multiple models to
improve predictive performance. It aggregates two or more learners (e.g. regression models,
neural networks) in order to produce better predictions. I

Different ways to combine Classifiers. : A Mixture of Experts (MoE) :


 . The Mixture of Experts consists of 2 main components:
 the “Experts," which are individual models trained to make predictions,
 a “Gating network," which determines how much weight to give to each expert's
prediction when making the final prediction.

15 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Different ways to combine Classifiers. :
Methods for
1. Independently Constructing Ensembles
2. Coordinated Construction of Ensembles

Methods for Independently Constructing Ensembles


 Majority Vote
 Bagging and Random Forest
 Randomness Injection
 Feature-Selection Ensembles
 Error-Correcting Output Coding
Methods for Coordinated Construction of Ensembles :
 Boosting
 Stacking
1. Voting : Some classification systems will only produce an output where all the
classifiers agree, or more than half of them agree, whereas others simply take the most
common output, which is what we usually mean by majority voting

2. Bagging : Bagging is a supervised learning technique that can be used for both
regression and classification tasks. Here is an overview of the steps including Bagging
classifier algorithm:
Bootstrap Sampling : Divides the original training data int ‘N’ subsets and randomly
selects a subset with replacement in some rows from other subsets. This step ensures
that the base models are trained on diverse subsets of the data and there is no class
imbalance.

16 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Bagging is a supervised learning technique that can be used for both regression and
classification tasks.
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree.

3. The Basic Random Forest Training :


The random forest algorithm is a machine learning technique that combines multiple decision
trees to make predictions: The algorithm generates many decision trees using random data
points and features. When asked to make a prediction, it outputs the most common answer
from all the trees.
For each of N trees:
– create a new bootstrap sample of the training set – use this bootstrap sample to train
a decision tree
– at each node of the decision tree, randomly select m features, and compute the
information gain (or Gini impurity) only on that set of features, selecting the optimal
one
– repeat until the tree is complete

4. Bosting : Sequentially builds a group of decision trees and corrects the residual
errors made by previous trees, enhancing its predictive accuracy.
It trains each new weak learner to fit the residuals of the previous ensemble’s
predictions thus making it less sensitive to individual data points or outliers in the data.

17 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


 Boosting is an ensemble technique that combines multiple weak learners to create a
strong learner

5. SStocing : It is a type of ensemble learning, also known as stacked generalization, is


a method that combines multiple different predictive models to improve the overall
performance.
This approach involves training multiple first-level models, or base learners, on the
same data and then using a second-level model, or meta-learner, to synthesize their
predictions.

7. Write AdaBoost algorithm


Bosting : Sequentially builds a group of decision trees and corrects the residual errors made by
previous trees, enhancing its predictive accuracy.
It trains each new weak learner to fit the residuals of the previous ensemble’s predictions
thus making it less sensitive to individual data points or outliers in the data.

Boosting is an ensemble technique that combines multiple weak learners to create a strong
learner

AdaBoost algorithm :
Step1 – Initialize the weights
For a dataset with N training data points instances, initialize N WiWi weights for each data point
𝟏
with 𝒘𝒊 =
𝑵
Step2 – Train weak classifiers

18 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Train a weak classifier Mk where k is the current iteration
 The weak classifier we are training should have an accuracy greater than 0.5 which
means it should be performing better than a naive guess
Step3 – Calculate the error rate and importance of each weak model M k :
 Calculate rate error_rate for every weak classifier Mk on the training dataset
 Calculate the importance of each model 𝜶𝒌 using formula
𝟏 𝟏−𝑬𝒓𝒓𝒐𝒓𝒌
𝜶𝒌 = 𝒍𝒏 ( )
𝟐 𝑬𝒓𝒓𝒐𝒓𝒌
Step4 – Update data point weight for each data point W i
After applying the weak classifier model to the training data we will update the weight assigned
to the points using the accuracy of the model

After applying the weak classifier model to the training data we will update the weight assigned
to the points using the accuracy of the model. The formula for updating the weights will
be wi=wiexp⁡(−αkyiMk(xi))
. 𝑾𝒊 = 𝑾𝒊 𝒆𝒙𝒑(−𝜶𝒌 𝒚𝒊 𝑴𝒌 (𝒙)) = 𝑾𝒊 𝒆−𝜶𝒌 𝒚𝒊 𝑴𝒌 (𝒙)
Here yi is the true output and Xi is the corresponding input vector
Step5 – Normalize the Instance weight
 We will normalize the instance weight so that they can be summed up to 1 using the
formula Wi=Wi/sum(W)
𝑾𝒊
𝑾𝒊 =
𝑺𝒖𝒎(𝑾)
Step6 – Repeat steps 2-5 for K iterations
We will train K classifiers and will calculate model importance and update the instance weights
using the above formula
 The final model M(X) will be an ensemble model which is obtained by combining these
weak models weighted by their model weights

19 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


8. Explain about K-Nearest Neighbor algorithm. (OR) Explain the Nearest Neighbour
Methods.
Nearest Neighbour Methods : Nearest neighbor methods is to find a predefined number of
training samples closest in distance to the new point, and predict the label from these.
Application : Applied in pattern recognition, data mining, and intrusion detection

Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was subsequently
expanded by Thomas Cover.

Euclidean Distance :; the cartesian distance between the two points which are in the
plane/hyperplane
𝒏

𝒅(𝒙𝒊 , 𝒚𝒊 ) = √∑(𝒙𝒊 − 𝒚𝒊 )𝟐
𝒊=𝟏

𝐶𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑑 = √(𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2


Manhattan Distance : the total distance traveled by the object instead of the displacement.
𝒏

𝒅(𝒙, 𝒚) = ∑ |𝒙𝒊 − 𝒚𝒊 |
𝒊=𝟏
Minkowski Distance : Used as Euclidean, as well as the Manhattan
𝒏 𝟏/𝒑

𝒅(𝒙, 𝒚) = [∑(𝒙𝒊 − 𝒚𝒊 )𝒑 ]
𝒊=𝟏
If 𝑝 = 2 then it is Euclidean Distance
If 𝑝 = 1 then it is Manhattan Distance

 Despite its simplicity, nearest neighbors has been successful in a large number of
classification and regression problems, including handwritten digits and satellite image
scenes.
 Being a non-parametric method, it is often successful in classification situations where
the decision boundary is very irregular.

 The classes in sklearn.neighbors can handle either NumPy arrays or scipy.sparse


matrices as input. For dense matrices, a large number of possible distance metrics are
supported. For sparse matrices, arbitrary Minkowski metrics are supported for searches.
There are many learning routines which rely on nearest neighbors at their core. One example is
kernel density estimation, discussed in the density estimation section

20 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


K-Nearest Neighbor (KNN) algorithm. is a classification algorithms in machine learning. It
belongs to the supervised learning domain

K-Nearest Neighbor (KNN) algorithm :


Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.

Step 1: Selecting the optimal value of K


K represents the number of nearest neighbors that needs to be considered while making
prediction.
Step 2: Calculating distance
To measure the similarity between target and training data points, Euclidean distance is
used. Distance is calculated between each of the data points in the dataset and target
point.
Step 3: Finding Nearest Neighbors
The k data points with the smallest distances to the target point are the nearest
neighbors.
Step 4: Voting for Classification or Taking Average for Regression
In the classification problem, the class labels of K-nearest neighbors are determined by
performing majority voting. The class with the most occurrences among the neighbors
becomes the predicted class for the target data point.

 In the regression problem, the class label is calculated by taking average of the target
values of K nearest neighbors. The calculated average value becomes the predicted
output for the target data point.

Advantages of KNN Algorithm :


 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.

21 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Disadvantages of KNN Algorithm :
 Always needs to determine the value of K which may be complex some time.
 The computation cost is high because of calculating the distance between the data
points for all the training samples.

9. Explain K means Cluster algorithm in detail


K-Means Clustering algorithm : K-Means Clustering is an Unsupervised Machine Learning
algorithm, which groups the unlabeled dataset into different clusters.
Here K defines the number of pre-defined clusters.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties (based on feature
similarity.) .

CART is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.

The term "k-means" was first used by James MacQueen in 1967, [2]
K- Meanss algorithm is also referred to as the Lloyd–Forgy algorithm
K-means clustering is a method of vector quantization, originally from signal processing,

It starts by randomly assigning the clusters centroid in the space. Then each data point assign to
one of the cluster based on its distance from centroid of the cluster. After assigning each point
to one of the cluster, new cluster centroids are assigned. This process runs iteratively until it
finds good cluster. In the analysis, assume that number of cluster is given in advanced

Algorithm :
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
𝒎𝒊𝒏
𝒅𝒊 = 𝒅(𝒙𝒊 , 𝝁𝒋 )
𝒋

22 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Where
𝑵𝒋
𝟏
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑗 𝑖𝑠 𝝁𝒋 = ∑ 𝒙𝒊
𝑵𝒋
𝒊=
And Euclidean Distance is
𝑛

𝑑(𝑥𝑖 , 𝑦𝑖 ) = √∑ (𝑥𝑖 − 𝑦𝑖 )2
𝑖=1

𝐶𝑎𝑟𝑡𝑒𝑠𝑖𝑎𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑑 = √(𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2


Manhattan Distance : The total distance traveled by the object instead of
the displacement.

𝒅(𝒙, 𝒚) = ∑ |𝒙𝒊 − 𝒚𝒊 |
𝒊=𝟏
Minkowski Distance : Used as Euclidean, as well as the Manhattan
𝒏 𝟏/𝒑

𝒅(𝒙, 𝒚) = [∑(𝒙𝒊 − 𝒚𝒊 )𝒑 ]
𝒊=𝟏
If 𝑝 = 2 then it is Euclidean Distance
If 𝑝 = 1 then it is Manhattan Distance
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

23 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


Flowchart :

Advantages :
 Easy implementation with computationally fast and efficient with large number of
variables
 Works well with distinct boundary data sets.
Disadvantages :
 Difficulty in predicting the exact k-value for unknown data set.
 Initial seeds have a strong influence on the final resulting cluster.

VERY SHORT ANSWER QUESTIONS


1. What is Decision Tree ?
Decision Tree : A decision tree is a type of supervised learning algorithm that is commonly used
in machine learning used for predictive modelling.
 It is a tree-like structure where each internal node tests on attribute, each branch
corresponds to attribute value and each leaf node represents the final decision or
prediction.
 The decision tree algorithm is used to solve both regression and classification
problems.

2. What is entropy in a decision tree?


Entropy : Measures the amount of uncertainty or impurity in the dataset.
If p is the probability of an instance being classified into a particular class then

24 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


𝑛

𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = − ∑ 𝑝𝑖 log2 𝑝𝑖
𝑖=1

3. What is the Gini Impurity


Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if it was
randomly classified according to the distribution of classes in the dataset.
If p is the probability of an instance being classified into a particular class then
𝐶

𝐺𝑖𝑛𝑖 (𝑆) = 1 − ∑ 𝑝𝑖 2
𝑖=1

.4. What is Information Gain?


Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split
on an attribute.
Suppose S is a set of instances, A is an attribute and 𝑺𝒇𝒊 is the subset of S for which
attribute A has value i, and the entropy of partitioning the data is calculated by weighing the
entropy of each partition by its size relative to the original set.
𝒏
|𝑺𝒇𝒊 |
𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏(𝑺, 𝑨) = 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺) − ∑ 𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝑺𝒇𝒊 )
|𝑺|
𝒊=𝟏

5. Write aplications of the CART Algorithm


To discover correlations between attributes, the CART model is utilized. In data mining, decision
trees are frequently used to build models that forecast the value of a goal based on the values
of numerous input variables (or independent variables).

6. What is Ensemble Learning?


Ensemble Learning : Ensemble learning is a machine learning technique that combines the
predictions of multiple models to improve predictive performance. The idea is that by
combining models with different strengths and weaknesses, the ensemble can achieve better
results than any single model.
Ensemble learning is a machine learning technique that aggregates two or more learners (e.g.
regression models, neural networks) in order to produce better predictions. In other w ords,

7. What is Boosting?
Boosting is a machine learning technique that improves the accuracy of predictive data analysis
by training multiple models sequentially. The goal is to create a strong learner from a collection
of weak learners, which are models that perform only slightly better than random guessing.
Boosting algorithms work by iteratively adjusting the weights of training instances, giving more
importance to misclassified instances. The final prediction is the weighted average of all the
predictions from the weak

25 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)


8. What is Bagging?
Bagging, or bootstrap aggregating, is a machine learning technique that uses multiple models to
improve the accuracy and stability of predictions. It's a type of ensemble learning, which is the
idea that a group of models working together can make better predictions than a single model.

9. What is the objective function of k-means?


The main objective of k-means clustering is to partition your data into a specific number (k) of
groups, where data points within each group are similar and dissimilar to points in other
groups. It achieves this by minimizing the distance between data points and their assigned
cluster's center, called the centroid

10. What is k-means algorithm in unsupervised learning?


K-means clustering is an unsupervised learning algorithm used for data clustering, which groups
unlabeled data points into groups or clusters
The K-means algorithm is an unsupervised machine learning algorithm that groups data points
into clusters based on their similarities:

11. What is the difference between classification and regression trees?


The main difference between classification and regression trees is the type of variable they
predict:
Classification trees : Predict categorical variables, such as whether someone will buy something
or not.
Regression trees :; Predict continuous variables, such as the price of a property or the
temperature of a day.

12. What is the random forest algorithm?


The random forest algorithm is a machine learning technique that combines multiple decision
trees to make predictions: The algorithm generates many decision trees using random data
points and features. When asked to make a prediction, it outputs the most common answer
from all the trees.

26 P. RAVI KISHORE| M.Sc., M.Tech. , B.Ed., (Ph d)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy