Introduction To Statistical Learning 12
Introduction To Statistical Learning 12
Introduction To Statistical Learning 12
Introduction
1. An overview of statistical learning
a. Statistical learning refers to a vast set of tools for understanding data
b. Statistical learning tools are classified as below
i. Supervised - Building a statistical model for predicting or estimating an
output based on one or more inputs
ii. Unsupervised - inputs but no supervised output
1. Regression - continuous or quantitative output (ex: wage data:
Predicting wage based on age, year and education)
2. Classification - categorical or qualitative output (ex: stock
market data: predicting today’s stock will increase or decrease)
3. Clustering - We only observe input without predicting ny
output(gene expression data: grouping groups of cancer
causing cells)
2. Notations:
a. n → number of distinct data points or observations
b. p → number of variables available in making predictions
c. 𝑥𝑖𝑗→ represents value of the j th variable for i th observation.
d. Vectors of length n → bold font
e. Vectors of length p → normal font
3. Datasets used:
a. ISLR package at https://cran.r-project.org/web/packages/ISLR/index.html
b. Below re the datasets used in this book
Chapter 2: Statistical Learning
1. What is statistical learning?
a. Input variables are also called as predictors, independent variables, features
or variables. Typically denoted by X.
b. Output variables are also called as response or dependent variables.
Typically denoted by Y.
c. Suppose that we observe a quantitative response Y of p different predictors,
we assume that there is some relation between Y and X(X1, X2, …..,Xn)
which can be written as
Y=f(X)+ ε
Where f is some fixed but unknown function of 𝑋1, 𝑋2, .., 𝑋𝑝
and ε is some random error term, which is independent of X and has
mean zero.
d. Statistical learning refers to a set of approaches to estimate f.
e. Why estimate f: Two reasons
i. Prediction
1. In many situations a set of inputs X are available, but output Y
can’t be easily obtained. Since error averages to zero, we can
predict Y using,
Where f^ represents our estimate for f and Y^ for our
resulting prediction for Y.
2. The accuracy of Y^ as a prediction for Y depends on two
quantities called reducible error and irreducible error.
a. Reducible error: Which can be reduced using most
appropriate statistical learning technique
b. Irreducible error: caused by ε as it may contain some
other variables which we do not contain in X.
c. We can say
2
Where E(𝑌 − Ŷ) represents average, or expected value of the
squared difference between predicted and expected value of Y
and var(ε) is the variance associated with error.
3. Example for prediction: Suppose we have a marketing
campaign and we have a set of demographic parameters
available. When the client wants to know who will respond
positively for a campaign, it is called a prediction. We do not go
to inner details how a particular demographic variable affects
the campaign.
ii. Inference:
1. In inference we do not estimate Y but we just want to know
how Y varies as 𝑋1, 𝑋2, .., 𝑋𝑝varies.
2. Here the f^ can’t be treated as a black box, because we need
to know its exact form
3. We look for
a. Which predictors are associated with the response
b. What is the relation between the response and each
predictor
c. Can the relationship between Y and other predictors is
linear or is more complicated?
4. Example: If the client want to know how sales are affected
based on certain variables(predictors) like price, store location,
competitor price, etc.. this is called inference as we are
interested in finding the relation between sales and each of
predictors.
5. Another simple example of prediction and inference is if in a
real estate business if the builder want to know having a
school near by will increase or decrease the house value, it is
called prediction. Instead if he wants to price the house for
having a school near by it is called inference.
iii. How do we estimate f:
1. All the methods will share a common characteristics.
2. We use certain data points or observation which we called
training data, because we use this data to train or teach, our
model to estimate f.
3. Corresponding to each input 𝑥𝑖there will be a output 𝑦𝑖. A set of
such input and output points is called the training data.
{(𝑥1, 𝑦1),(𝑥2, 𝑦2),......(𝑥𝑛, 𝑦𝑛)} → training data
4. Statistical methods to find f can be categorised into parametric
methods and nonparametric methods based on how we find f.
a. Parametric methods:
i. In parametric form we assume about the
functional form or shape of f and then we need
to find a procedure to fit or train the data.
ii. Example: we assume that the function is linear
as below
f(X)=β0 + β1𝑋1+β2𝑋2 +....... + β𝑝𝑋𝑝
Now we have to estimate p+1 parameters
(β0 , β1, β2,... β𝑝 ) which simplifies the problem
Where x bar and y bar are the means(averages) of the n observations and
corresponding outputs.
c.