0% found this document useful (0 votes)
4 views

Lect 10 Regression

Uploaded by

danieltshuma64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lect 10 Regression

Uploaded by

danieltshuma64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

App of Python for DS & AI Lecture 10

Lecture 10
Application of Python language for Data Science and Artificial Intelligence

Regression
(lecture without lecture)

Supervised learning example: Simple linear regression


Regression analysis is one of the most important fields in statistics and machine learning.
There are many regression methods available.
Linear regression is one of them.

What Is Regression?
Regression searches for relationships among variables.
In regression analysis, you consider some phenomenon of interest and have a number of
observations.
Each observation has two or more features.
Following the assumption that at least one of the features depends on the others, you try to
establish a relation among them.
You need to find a function that maps some features or variables to others sufficiently well.
The dependent features are called the dependent variables, outputs, or responses.
The independent features are called the independent variables, inputs, regressors, or
predictors.
Regression problems usually have one continuous and unbounded dependent variable.
The inputs, however, can be continuous, discrete, or even categorical.
It’s a common practice to denote the outputs with 𝑦 and the inputs with 𝑥.
If there are two or more independent variables, then they can be represented as the vector 𝐱 =
(𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of inputs.

When Do You Need Regression?


Typically, you need regression to answer whether and how some phenomenon influences the
other or how several variables are related.
Regression is also useful when you want to forecast a response using a new set of predictors.
Regression is used in many different fields, including economics, computer science, and the
social sciences.

Linear Regression
Linear regression is probably one of the most important and widely used regression
techniques.
It’s among the simplest regression methods.
One of its main advantages is the ease of interpreting results.

Problem Formulation
When implementing linear regression of some dependent variable 𝑦 on the set of independent
variables 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of predictors, you assume a linear relationship
between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀.
This equation is the regression equation. 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the regression coefficients, and 𝜀 is
the random error.
Linear regression calculates the estimators of the regression coefficients or simply the
predicted weights, denoted with 𝑏₀, 𝑏₁, …, 𝑏ᵣ.

Piotr Zaremba Summer 2025 1/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

These estimators define the estimated regression function 𝑓(𝐱) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ.
This function should capture the dependencies between the inputs and output sufficiently
well.
The estimated or predicted response, 𝑓(𝐱ᵢ), for each observation 𝑖 = 1, …, 𝑛, should be as
close as possible to the corresponding actual response 𝑦ᵢ.
The differences 𝑦ᵢ – 𝑓(𝐱ᵢ) for all observations 𝑖 = 1, …, 𝑛, are called the residuals.
Regression is about determining the best predicted weights – that is, the weights
corresponding to the smallest residuals.
To get the best weights, you usually minimize the sum of squared residuals (SSR) for all
observations 𝑖 = 1, …, 𝑛:
SSR = Σᵢ(𝑦ᵢ - 𝑓(𝐱ᵢ))².

This approach is called the method of ordinary least squares.

Regression Performance
The variation of actual responses 𝑦ᵢ, 𝑖 = 1, …, 𝑛, occurs partly due to the dependence on the
predictors 𝐱ᵢ.
However, there’s also an additional inherent variance of the output.
The coefficient of determination, denoted as 𝑅², tells you which amount of variation in 𝑦 can
be explained by the dependence on 𝐱, using the particular regression model.
A larger 𝑅² indicates a better fit and means that the model can better explain the variation of
the output with different inputs.
The value 𝑅² = 1 corresponds to SSR = 0.
That’s the perfect fit, since the values of predicted and actual responses fit completely to each
other.

Simple Linear Regression


Simple or single-variate linear regression is the simplest case of linear regression, as it has a
single independent variable, 𝐱 = 𝑥.
The following figure illustrates simple linear regression:

The estimated regression function, represented by the black line, has the equation:
𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥.

Your goal is to calculate the optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize
SSR and determine the estimated regression function.
The value of 𝑏₀, also called the intercept, shows the point where the estimated regression line
crosses the 𝑦 axis. It’s the value of the estimated response 𝑓(𝑥) for 𝑥 = 0.

The value of 𝑏₁ determines the slope of the estimated regression line.


The predicted responses, shown as red squares, are the points on the regression line that
correspond to the input values.

Piotr Zaremba Summer 2025 2/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

The vertical dashed gray lines represent the residuals, which can be calculated as:
𝑦ᵢ - 𝑓(𝐱ᵢ) = 𝑦ᵢ - 𝑏₀ - 𝑏₁𝑥ᵢ for 𝑖 = 1, …, 𝑛.

They’re the distances between the green circles and red squares.
When you implement linear regression, you’re actually trying to minimize these distances and
make the red squares as close to the predefined green circles as possible.

Task_01
import …
import …
from sklearn.linear_model import LinearRegression

np.random.seed(0)
x = 10*np.random.rand(50)
y = 2*x -1 + np.random.rand(50)*6
plt.scatter(x,y,color='b')

X = x[:, np.newaxis]
print(X.shape)

model = LinearRegression().fit(X,y)
LinearRegression(fit_intercept=True)

print('b0= ', model.intercept_)


print('b1= ', model.coef_)
print('R2= ', model.score(X, y))

xfit = np.linspace(-1, 11)


yfit = model.predict(xfit[:, np.newaxis])
plt.plot(xfit, yfit)

np.random.seed()
xnew = 10*np.random.rand(10);print(xnew.shape)
ypred = model.predict(xnew.reshape(-1,1))
plt.scatter(xnew, ypred, color="r")

plt.savefig('fig_01.png')
plt.show()

Task_02
Repeat building the linear regression model for the data below.

Piotr Zaremba Summer 2025 3/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

Data np.array
x = [5, 15, 25, 35, 45, 55]
y = [5, 20, 14, 32, 22, 38]
xnew = arange(5)

Task_03
Generate a set of observations (x,y) with 127 elements.
x values should be in the range [-10,10].
y values should depend linearly on x plus a random factor.
• Draw a scatter plot for these observations.
• Draw a linear regression line for these observations.
• Print the parameters of the regression equation.
• Generate 23 new values of the independent variable x.
• For these values, calculate the predicted value of y.
• On the graph, mark the new observations x and the predicted y values for them.

Multiple Linear Regression


Multiple or multivariate linear regression is a case of linear regression with two or more
independent variables.
If there are just two independent variables, then the estimated regression function is:
𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂.
It represents a regression plane in a three-dimensional space.
The goal of regression is to determine the values of the weights 𝑏₀, 𝑏₁, and 𝑏₂ such that this
plane is as close as possible to the actual responses, while yielding the minimal SSR.
The case of more than two independent variables is similar.
The estimated regression function is:
𝑓(𝑥₁, …, 𝑥ᵣ) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ +𝑏ᵣ𝑥ᵣ

Task_04
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

np.random.seed(0)
x1 = 10*np.random.rand(100)
x2 = 10*np.random.rand(100)
y = 2*x1 + 3*x2 - 3 + np.random.rand(100)*10

ax = plt.axes(projection='3d')
ax.scatter3D(x1, x2, y, c=y, cmap='PuBu')

x = np.hstack((x1.reshape(-1,1),x2.reshape(-1,1)))
model = LinearRegression().fit(x,y)

X1 = np.linspace(0,10,100)
X2 = np.linspace(0,10,100)
XX1, XX2 = np.meshgrid(X1,X2)

Piotr Zaremba Summer 2025 4/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

Z = model.coef_[0]*XX1 + model.coef_[1]*XX2 + model.intercept_


ax.plot_surface(XX1, XX2, Z, cmap='viridis', alpha=0.1)
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_zlabel('Y')
ax.set_title('Linear regression model: y = b0 + b1x1 + b2x2')

Task_05
Repeat building the linear regression model for the data below.
x1 = [0, 5, 15, 25, 35, 45, 55, 60]
x2 = [1, 1, 2, 5, 11, 15, 34, 35]
y = [4, 5, 20, 14, 32, 22, 38, 43]

Task_06
Generate a set of observations (x, y) with 134 elements. x = (x1, x2)
x values should be in the range [-10,10].
y values should depend linearly on x plus a random factor.
• Draw a scatter plot for these observations.
• Draw a surface regression line for these observations.
• Print the parameters of the regression equation.
• Generate 32 new values of the independent variable x.
• For these values, calculate the predicted value of y.
• On the graph, mark the new observations x and the predicted y values for them.

Polynomial Regression
You can regard polynomial regression as a generalized case of linear regression.
You assume the polynomial dependence between the output and inputs and, consequently, the
polynomial estimated regression function.
In addition to linear terms like 𝑏₁𝑥₁, regression function 𝑓 can include nonlinear terms such as
𝑏₂𝑥₁², 𝑏₃𝑥₁³, 𝑏₄𝑥₁𝑥₂, 𝑏₅𝑥₁²𝑥₂, ...

The simplest example of polynomial regression has a single independent variable, and the
estimated regression function is a polynomial of degree two:
𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥 + 𝑏₂𝑥².

This is why you can solve the polynomial regression problem as a linear problem with the
term 𝑥² regarded as an input variable.

Piotr Zaremba Summer 2025 5/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

In the case of two variables and the polynomial of degree two, the regression function has this
form:
𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂ + 𝑏₃𝑥₁² + 𝑏₄𝑥₁𝑥₂ + 𝑏₅𝑥₂².

The procedure for solving the problem is identical to the previous case.
You apply linear regression for five inputs: 𝑥₁, 𝑥₂, 𝑥₁², 𝑥₁𝑥₂, and 𝑥₂².
As the result of regression, you get the values of six weights that minimize SSR: 𝑏₀, 𝑏₁, 𝑏₂, 𝑏₃,
𝑏₄, and 𝑏₅.

Task_07
Nonlinear (polynomial) regression one explanatory variable with second degree.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt

x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))


y = np.array([15, 11, 2, 8, 25, 32])

transformer = PolynomialFeatures(degree=2, include_bias=False)


transformer.fit(x)
x_ = transformer.transform(x)
x_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(x)
model = LinearRegression().fit(x_, y)
print(f"coefficient of determination: {model.score(x_, y)}")
print(f"intercept: {model.intercept_}")
print(f"coefficients: {model.coef_}")

y_pred = model.predict(x_)
print(f"predicted response:\n{y_pred}")
print(f"y=\n{y}")
plt.scatter(x,y,color='b')
plt.scatter(x,y_pred,color='r')

xLine = np.linspace(4, 60)


xfit = np.linspace(4, 60).reshape((-1, 1))
transformer = PolynomialFeatures(degree=2, include_bias=False)
transformer.fit(xfit)
xfit_ = transformer.transform(xfit)
yfit_ = model.predict(xfit_)
plt.plot(xLine, yfit_)

Task_08
Version of task 7
Generate a set of observations (x,y) with 33 elements.
• Draw a scatter plot for these observations.
• Build nonlinear (polynomial) regression.
• Print the parameters of the regression equation.
• Draw a plot of this nonlinear regression.
• Generate 11 new values of the independent variable x.

Piotr Zaremba Summer 2025 6/7 Lect_10_Regression.docx


App of Python for DS & AI Lecture 10

• For these values, calculate the predicted value of y.


• On the graph, mark the new observations x and the predicted y values for them.

Piotr Zaremba Summer 2025 7/7 Lect_10_Regression.docx

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy