Introduction To Regression: George Boorman
Introduction To Regression: George Boorman
Introduction To Regression: George Boorman
regression
SUPERVISED LEARNING WITH SCIKIT-LEARN
George Boorman
Core Curriculum Manager, DataCamp
Predicting blood glucose levels
import pandas as pd
diabetes_df = pd.read_csv("diabetes.csv")
print(diabetes_df.head())
(752,) (752,)
X_bmi = X_bmi.reshape(-1, 1)
print(X_bmi.shape)
(752, 1)
George Boorman
Core Curriculum Manager, DataCamp
Regression mechanics
y = ax + b
Simple linear regression uses one feature
y = target
x = single feature
a, b = parameters/coe cients of the model - slope, intercept
In higher dimensions:
Known as multiple regression
Must specify coe cients for each feature and the variable b
y = a1 x1 + a2 x2 + a3 x3 + ... + an xn + b
High R2 : Low R2 :
0.356302876407827
RM SE = √M SE
24.028109426907236
George Boorman
Core Curriculum Manager, DataCamp
Cross-validation motivation
Model performance is dependent on the way we split up the data
Solution: Cross-validation!
10 folds = 10-fold CV
k folds = k-fold CV
print(np.mean(cv_results), np.std(cv_results))
0.7418682216666667 0.023330243960652888
array([0.7054865, 0.76874702])
George Boorman
Core Curriculum Manager, DataCamp
Why regularize?
Recall: Linear regression minimizes a loss function