ML Unit 3
ML Unit 3
threshold.
Pruning: The process of removing branches or nodes
from a decision tree to improve its generalisation
and prevent overfitting.
Page 2 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 3 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
2 𝑃+𝑁
log
𝑃+𝑁
𝑃𝑖+𝑁
For each attribute , find entropy E(A) where E(A) = ∑ 𝑖
𝑃+𝑁
I(𝑃𝑖𝑁𝑖)
Entropy = IGX Probability
Page 4 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 5 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 6 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 7 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 9 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 11 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 13 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 14 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 15 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 17 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 10 of 28
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 12 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 13 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 14 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 15 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
• Median=(n+12)thMedian=(2n+1)th
• For even number of data points:
Page 16 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Measures of Dispersion
• Variance: The average squared deviation from the mean, representing data spread.
• Standard Deviation: The square root of variance, indicating data spread relative to the mean.
• Interquartile Range: The range between the first and third quartiles, measuring data
spread around the median.
Measures of Shape
Page 17 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Measures of Shape
Inferential Statistics
Inferential statistics is used to make predictions by taking any group of data in which you are
interested. It can be defined as a random sample of data taken from a population to describe and
make inferences about the population. Any group of data that includes all the data you are
interested in is known as population. It basically allows you to make predictions by taking a small
sample instead of working on the whole population.
Estimation
Page 18 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Hypothesis Testing
• Null and Alternative Hypotheses: The null hypothesis assumes no effect or relationship,
while the alternative suggests otherwise.
• Type I and Type II Errors: Type I error is rejecting a true null hypothesis, while Type II is
failing to reject a false null hypothesis
• p-Values: Measure the probability of obtaining the observed results under the
null hypothesis.
Chi-Square Tests:
Correlation
Regression Analysis
Page 19 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Bayesian Statistics
Bayesian statistics incorporate prior knowledge with current evidence to update beliefs.
Bayes' Theorem is a fundamental concept in probability theory that relates conditional probabilities.
It is named after the Reverend Thomas Bayes, who first introduced the theorem. Bayes' Theorem is
a mathematical formula that provides a way to update probabilities based on new evidence. The
formula is as follows:
P(A∣B)=P(B)P(B∣A)⋅P(A), where
• P(A∣B): The probability of event A given that event B has occurred (posterior probability).
• P(B∣A): The probability of event B given that event A has occurred (likelihood).
Statistics is a key component of machine learning, with broad applicability in various fields.
• Anomaly detection and quality control benefit from statistics by identifying deviations
from norms, aiding in the detection of defects in manufacturing processes.
Page 110 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
distribution.
Page 112 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 113 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 114 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 20 of 28
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 21 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
EM Algorithm Flowchart
1. Initialization:
Initially, a set of initial values of the
parameters are considered. A set of incomplete
observed data is given to the system with the
assumption that the observed data comes
from a specific model.
2. E-Step (Expectation Step): In this step, we use the
observed data in order to estimate or guess the
values of the missing or incomplete data. It is
basically used to update the variables.
Compute the posterior probability or
responsibility of each latent variable given
the observed data and current parameter
estimates.
Estimate the missing or incomplete data values
using the current parameter estimates.
Compute the log-likelihood of the observed
data based on the current parameter estimates
and estimated missing data.
Page 22 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 23 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 24 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Disadvantages of EM algorithm
It has slow convergence.
It makes convergence to the local optima only.
It requires both the probabilities, forward and
backward (numerical optimization requires only
forward probability).
Page 25 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 27 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 29 of 50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 210 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Let X be the training dataset with n data points, where each data
point is represented by a d-dimensional feature vector XiXi and Y be
the
Page 212 of
50
R22 – JNTUH-Machine Learning UNIT -III Notes
Page 214 of
50