Introduction To Regression: George Boorman

Introduction to
regression
SUPERVISED LEARNING WITH SCIKIT-LEARN
George Boorman
Core Curriculum Manager, DataCamp
Predicting blood glucose levels
import pandas as pd
diabetes_df = pd.read_csv("diabetes.csv")
print(diabetes_df.head())
pregnancies glucose triceps insulin bmi age diabetes

0 6 148 35 0 33.6 50 1
1 1 85 29 0 26.6 31 0
2 8 183 0 0 23.3 32 1
3 1 89 23 94 28.1 21 0
4 0 137 35 168 43.1 33 1

Creating feature and target arrays
X = diabetes_df.drop("glucose", axis=1).values
y = diabetes_df["glucose"].values
print(type(X), type(y))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>

Making predictions from a single feature
X_bmi = X[:, 3]
print(y.shape, X_bmi.shape)
(752,) (752,)
X_bmi = X_bmi.reshape(-1, 1)
print(X_bmi.shape)
(752, 1)

Plotting glucose vs. body mass index
import matplotlib.pyplot as plt
plt.scatter(X_bmi, y)
plt.ylabel("Blood Glucose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()

Plotting glucose vs. body mass index

Fitting a regression model
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_bmi, y)
predictions = reg.predict(X_bmi)
plt.scatter(X_bmi, y)
plt.plot(X_bmi, predictions)
plt.ylabel("Blood Glucose (mg/dl)")
plt.xlabel("Body Mass Index")
plt.show()

Fitting a regression model

Let's practice!
The basics of linear
regression
George Boorman
Regression mechanics
y = ax + b
Simple linear regression uses one feature
y = target
x = single feature
a, b = parameters/coe cients of the model - slope, intercept
How do we choose a and b?

De ne an error function for any given line
Choose the line that minimizes the error function
Error function = loss function = cost function

The loss function

The loss function

The loss function

The loss function

The loss function

Ordinary Least Squares
n
RSS = ∑(yi − y^i )2
i=1
Ordinary Least Squares (OLS): minimize RSS

Linear regression in higher dimensions
y = a1 x1 + a2 x2 + b
To t a linear regression model here:

Need to specify 3 variables: a1 , a2 , b
In higher dimensions:
Known as multiple regression
Must specify coe cients for each feature and the variable b
y = a1 x1 + a2 x2 + a3 x3 + ... + an xn + b
scikit-learn works exactly the same way:

Pass two arrays: features and target

Linear regression using all features
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
reg_all = LinearRegression()
reg_all.fit(X_train, y_train)
y_pred = reg_all.predict(X_test)

R-squared
R2 : quanti es the variance in target values
explained by the features
Values range from 0 to 1
High R2 : Low R2 :

R-squared in scikit-learn
reg_all.score(X_test, y_test)
0.356302876407827

Mean squared error and root mean squared error
n
1
M SE = ∑(yi − y^i )2
n
i=1
M SE is measured in target units, squared
RM SE = √M SE
Measure RM SE in the same units at the target variable

RMSE in scikit-learn
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred, squared=False)
24.028109426907236

Let's practice!
Cross-validation
George Boorman
Cross-validation motivation
Model performance is dependent on the way we split up the data
Not representative of the model's ability to generalize to unseen data
Solution: Cross-validation!

Cross-validation basics










Cross-validation and model performance
5 folds = 5-fold CV
10 folds = 10-fold CV
k folds = k-fold CV
More folds = More computationally expensive

Cross-validation in scikit-learn
from sklearn.model_selection import cross_val_score, KFold
kf = KFold(n_splits=6, shuffle=True, random_state=42)
reg = LinearRegression()
cv_results = cross_val_score(reg, X, y, cv=kf)

Evaluating cross-validation peformance
print(cv_results)
[0.70262578, 0.7659624, 0.75188205, 0.76914482, 0.72551151, 0.73608277]
print(np.mean(cv_results), np.std(cv_results))
0.7418682216666667 0.023330243960652888
print(np.quantile(cv_results, [0.025, 0.975]))
array([0.7054865, 0.76874702])

Let's practice!
Regularized
regression
George Boorman
Why regularize?
Recall: Linear regression minimizes a loss function
It chooses a coe cient, a, for each feature variable, plus b

Large coe cients can lead to over ing
Regularization: Penalize large coe cients

Ridge regression
Loss function = OLS loss function +
n
α ∗ ∑ ai 2
i=1
Ridge penalizes large positive or negative coe cients
α: parameter we need to choose

Picking α is similar to picking k in KNN
Hyperparameter: variable used to optimize model parameters
α controls model complexity

α = 0 = OLS (Can lead to over ing)
Very high α: Can lead to under ing

Ridge regression in scikit-learn
from sklearn.linear_model import Ridge
scores = []
for alpha in [0.1, 1.0, 10.0, 100.0, 1000.0]:
ridge = Ridge(alpha=alpha)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
scores.append(ridge.score(X_test, y_test))
print(scores)
[0.2828466623222221, 0.28320633574804777, 0.2853000732200006,

0.26423984812668133, 0.19292424694100963]

Lasso regression
Loss function = OLS loss function +
n
α ∗ ∑ ∣ai ∣
i=1

Lasso regression in scikit-learn
from sklearn.linear_model import Lasso
scores = []
for alpha in [0.01, 1.0, 10.0, 20.0, 50.0]:
lasso = Lasso(alpha=alpha)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
scores.append(lasso.score(X_test, y_test))
print(scores)
[0.99991649071123, 0.99961700284223, 0.93882227671069, 0.74855318676232, -0.05741034640016]

Lasso regression for feature selection
Lasso can select important features of a dataset
Shrinks the coe cients of less important features to zero
Features not shrunk to zero are selected by lasso

Lasso for feature selection in scikit-learn
from sklearn.linear_model import Lasso
X = diabetes_df.drop("glucose", axis=1).values
y = diabetes_df["glucose"].values
names = diabetes_df.drop("glucose", axis=1).columns
lasso = Lasso(alpha=0.1)
lasso_coef = lasso.fit(X, y).coef_
plt.bar(names, lasso_coef)
plt.xticks(rotation=45)
plt.show()

Lasso for feature selection in scikit-learn

Let's practice!

Introduction To Regression: George Boorman

Uploaded by

Copyright:

Available Formats

Introduction To Regression: George Boorman

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Regression: George Boorman

Uploaded by

Copyright:

Available Formats

Introduction to

pregnancies glucose triceps insulin bmi age diabetes

SUPERVISED LEARNING WITH SCIKIT-LEARN

<class 'numpy.ndarray'> <class 'numpy.ndarray'>

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

How do we choose a and b?

Choose the line that minimizes the error function

Error function = loss function = cost function

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

Ordinary Least Squares (OLS): minimize RSS

SUPERVISED LEARNING WITH SCIKIT-LEARN

To t a linear regression model here:

scikit-learn works exactly the same way:

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

M SE is measured in target units, squared

Measure RM SE in the same units at the target variable

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

Not representative of the model's ability to generalize to unseen data

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

More folds = More computationally expensive

SUPERVISED LEARNING WITH SCIKIT-LEARN

SUPERVISED LEARNING WITH SCIKIT-LEARN

[0.70262578, 0.7659624, 0.75188205, 0.76914482, 0.72551151, 0.73608277]

print(np.quantile(cv_results, [0.025, 0.975]))

SUPERVISED LEARNING WITH SCIKIT-LEARN

It chooses a coe cient, a, for each feature variable, plus b

Regularization: Penalize large coe cients

SUPERVISED LEARNING WITH SCIKIT-LEARN

Ridge penalizes large positive or negative coe cients

α: parameter we need to choose

Hyperparameter: variable used to optimize model parameters

α controls model complexity

Very high α: Can lead to under ing

SUPERVISED LEARNING WITH SCIKIT-LEARN

[0.2828466623222221, 0.28320633574804777, 0.2853000732200006,