0% found this document useful (0 votes)

15 views

Data Science Unit-II

Uploaded by

reddyshadvalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Data Science Unit-II

Uploaded by

reddyshadvalini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Data Science - Unit - II

ht
Instructor : Krishna Dutt,

ig
krishnadutt.rvs@gmail.com

r
py
March 23, 2024

Co
ft-
Disclaimer: The views expressed in this presentation are those of the
ra
author, and many open source content are referenced, with all authors
duly acknowledged.
D

Copyright: This beamer is for private targetted circulation only. Content

in this beamer should not be copied either in part of full without prior
approval of the instructor.

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 1 / 28
UNIT - II syllubus

ht
ig
Linear multiple regression Estimation and testing of coefficients R2 and

r
adjusted R2 Logistic regression Estimation and Testing of coefficients R2

py
and adjusted R2 coefficients Logistic regression and interpretation of

Co
coefficients K– Nearest Neighbor classifier random forest classification
errors Ridge Regression Support Vector Machine Analysis of various
ft-
(ANOVA)
ra
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 2 / 28
Linear Regression - Single Variable
Data / features / dimensions: represent the state of real-world
phenomena
Re-presentable in primitive formats like numerical, characters, strings

ht
Composite format: image, video, sound, text, etc.

ig
Data can be in structured or non-structured (SQL/NO SQL) formats

r
Both input and corresponding output can be considered as data

py
In some cases, no explicit mention of output data, but a goal shall be

Co
considered
Usually, an n-dimensional input
ft-
M-dimensional output
Assume all data is converted to numerical format
ra

X Y Y∼
D

x1 y1 (y1 − h(θ0 , θ1 , x1 ))
x2 y2 (y2 − h(θ0 , θ1 , x2 ))
· · ·
xm ym (ym − h(θ0 , θ1 , xm ))
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 3 / 28
Linear Regression - Single Variable
Assume all data is converted to numerical format
In linear Regression, we assume a model
Yi∼ = h(θ0 , theta1 , xi ) = θ0 + θ1 .xi

ht
(1)

ig
The model is described two parameters θ0 &θ1 , which are unknown. The
goal is to estimate these unknowns for a given data. In the graph, which

r
py
line represents best fit for a given data?

Co
ft-
ra
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 4 / 28
Matrix Calculus reqd. for Linear Regression

Matrix Calculus deals with differentiating a scalar and vector functions of

vector variable. We encounter a scalar function of vector variable in linear

ht
regression problems, accordingly we provide the formulas for differentiating
a scalar function of vector variable. Consider a vector variable X and

ig
Constant vector A, y scalar
 function  of 
X y=

r
py
x1 a1
 x2   a2 
XT · A = AT · X ; X =  .  ; A =  .  ;

Co
   
.
 .   .. 
xm am
ft-
∂(AT · X ) ∂(X T · A)
ra

= =A (2)
X X
D

∂(X T · A · X )
=2·A·X (3)
X

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 5 / 28
Linear Regression - Uni-variate
Consider the case of a uni variate. Linear Model is defined by

ỹi = θ0 + θ1 .xi ; (4)

ht
ỹi and xi , are the i th response and predictor values respectively from data

ig
set. We consider a dataset of n values and the corresponding vectors are

r
py
   
x1 y1
 x2   y2 

Co
θ
X =  . ;Y =  . ;θ = 0 ; (5)
   
.
 .  .
 .  θ1
ft-
xm ym
ra

Recast X as  
1 x1
D

1 x2 
X = . ; (6)
 
 .. 
1 xm
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 6 / 28
Linear Regression Ordinary least square minimization

The Model, in vector form is represented as

ht
Ỹ = X · θ (7)

r ig
The deviation between actual and model predicted value is given by

py
Co
ϵ = (Y − Ỹ ) (8)

; Square error, ϵ2 , is called Cost function, J(θ), or sometimes called error

ft-
function also. This cost function represents the sum of square of errors
ra

between each actual value of dependent variable and model predicted

value of dependent variable considering each point represented by
D

independent variable from training data set.

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 7 / 28
Linear Regression .. contd.
In ordinary Least square fit of the Linear model, we minimize J(θ), as cost
function is convex and easy to obtain explicit solution by equating the
gradient of J(θ) = 0.

ht
J(θ) = ϵT .ϵ = (Y − Ỹ )T · (Y − Ỹ ) = (Y T − Ỹ T )(Y − Ỹ )

r ig
= YT · Y − Y T · Ỹ − Ỹ T · Y + Ỹ T · Ỹ

py
= YT · Y − Y T (X · θ) − (X · θ)T · Y + (X · θ)T · (X · θ)
= YT · Y − Y T · X · θ − θT · X T · Y + θT · X T · X · θ(9)

∂J(θ) Co
∂(Y T · X · θ) ∂(θT · X T · Y ) ∂(θT · X T · X · θ)
ft-
=− − +
∂θ ∂θ ∂θ ∂θ
ra

= -(Y X ) − (X Y ) + 2 · X · X θ = −2 · (X Y ) + 2 · (X T · X · θ)(10)
T T T T T
D

∂J(θ)
= 0; −2 · (X T · Y ) + 2 · (X T · X · θ) = 0 (11)
∂θ
(X T · Y ) = (X T X ) · θ; θ = (X T X )− 1(X T Y ) (12)
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 8 / 28
Linear Regression - Multivariate
Consider a multivariate data
   
x11 , x12 , · · x1n y1  
θ0

ht
x21 , x22 , · · x2n   y2 
 θ1 
X =  ; X =  ..  ; θ =   ; (13)
   
..

ig
 .   .  ..
.θn
xm1 , xm2 , ·xmn

r
ym

py
Recast X as

Co
 
1, x11 , x12 , · · x1n
ft-
 1, x21 , x22 , · · x2n

X =  ; Ỹ = X · θ; ϵ = Y − Ỹ (14)
 
..
.
ra
 
1, xm1 , xm2 , · · xmn
D

J(θ) = ϵT · ϵ (15)
The objective J(θ) still scalar function of vector θ. The earlier solution for
θstillholds!!
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 9 / 28
Linear Regression - Example
Consider a uni-variate hypothetical example.
     
1 2 1, 1

ht

2 3 θ 0 1, 2
 
X =4 ; Y = 5 ; θ = θ1 ; Recast X as X = 1, 4
   (16)

ig
5 6 1, 5

r
py
 
1, 1
T 1, 1, 1, 1  1, 2 4, 12

Co
X ·X = ·   = (17)
1, 2, 4, 5 1, 4 12, 46
1, 5
ft-

T − 1 46, −12 1 46, −12
(X X ) 1 = = (18)
ra

(4 · 46 − 12 · 12) 12, 4 40 12, 4

 
2
T 1, 1, 1, 1 3  16
X Y = ·  = (19)
1, 2, 4, 5 5 58
6
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 10 / 28
Linear Regression - Example - contd..

T − 1 T 46, −12 16 1
θ = (X X ) 1X Y = θ = · = (20)

ht
40 12, 4 58 1

ig
The following bi-variate problem can be solved using the above steps.

r
   
1, 2 3

py
2, 3 4
X = 4, 5 Y = 5 (21)

Co
  

5, 6 6
ft-
Verify the answer for the above bi-variate is
ra

 
1.7
D

θ= 0  (22)
0.7

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 11 / 28
Bivariate Least square - Example

Another bi-variate problem can be solved using the above steps. marks
obtained in theory and lab and

ht
 
Theory Lab grade

ig
 60 70 A 

r
 
Data =  70 75 A 

py


 40 55 B 

Co
30
ft- 60 F
   
60, 70 3
70, 75 3
ra
X =
40, 55 Y = 2
   (23)
D

30, 60 0
In the above ordinal values of grades are transformed into numerical values
to facilitate linear regression.

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 12 / 28
Ridge Regression
The above least square regression minimizes the deviations of model
predicted values w.r.to actual data points, the problem of over fitting can
not be avoided. An additional loss term consisting of model parameters

ht
alone is added to the objective to penalize model parameters and control

ig
the overfitting.
J(θ) = ϵT .ϵ + λθT θ

r
(24)

py
The least square fit is obtained, as earlier by solving

Co
∂J(θ)
=0 (25)
∂θ
ft-
First part of
J(ϵT · ϵ)
ra
J(θ)
=
∂θ ∂θ
D

given above is already obtained. The second part is

∂λ(θT .θ)
=λ·I (26)
∂θ
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 13 / 28
Ridge Regession..contd.

ht
r ig
Therefore total solution with Ridge regression is obtained as

py
θ = (X T X + ·I)− 1(X T Y ) (27)

Co
ft-
ra
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 14 / 28
Hypothesis Testing in Linear Regression: T-Test
Objective: Assess the significance of individual coefficients in linear
regression.
Hypothesis Testing for Individual Coefficients:

ht
Null Hypothesis (H0 ): The coefficient is equal to zero (βi = 0).

ig
Alternative Hypothesis (H1 ): The coefficient is not equal to zero

r
(βi ̸= 0).

py
T-Test Statistic for Linear Regression:

Co
β̂i
ti =
Standard Error(β̂i )
ft-
Degrees of Freedom:
ra

df = n − p − 1
D

(where n is the number of observations and p is the number of predictors)

Degrees of Freedom: In the context of hypothesis testing for linear
regression, degrees of freedom (df ) are calculated as n − p − 1, where n is
the sample size and p is the number of predictors in the model.
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 15 / 28
Hypothesis testing in Linear Regression - T test - contd..

ht
ig
Decision Rule:

r
If |ti | is significantly different from zero, reject H0 in favor of H1 .

py
Common significance levels include 0.05, 0.01, etc.

Co
Interpretation:
A significant coefficient suggests that the corresponding predictor is
ft-
associated with the response variable.
ra
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 16 / 28
Numerical Example: Hypothesis Testing in Linear
Regression
Scenario: We have a linear regression model with a single predictor X and

ht
the response variable Y .
Hypotheses:

ig
Null Hypothesis (H0 ): β1 = 0 (No relationship between X and Y ).

r
Alternative Hypothesis (H1 ): β1 ̸= 0 (There is a significant

py
relationship).

Co
Given:
Sample size (n) = 50
ft-
Estimated coefficient (β̂1 ) = 2.5
Standard error (SE (β̂1 )) = 1.2
ra

Degrees of freedom (df ) = n − p − 1 (Assuming p = 1) = 48

Significance level (α) = 0.05

T-Test Statistic:
2.5
t= ≈ 2.08
1.2
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 17 / 28
Numerical Example: Hypothesis Testing in Linear
Regression-contd..

ht
ig
Decision Rule:

r
If |t| > tα/2,df , reject H0 .

py
For α = 0.05 and df = 48, tα/2,df ≈ 2.013.

Co
Conclusion:
Since |t| > 2.013, we reject H0 in favor of H1 .
ft-
There is sufficient evidence to suggest a significant relationship
ra

between X and Y .
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 18 / 28
Correlation Coefficient R

The formula for R

(X − 1µX )T · (Y − 1µY )

ht
R=p
(X − 1µX )T · (X − 1µX ) · (Y − 1µY )T · (Y − 1µX )

r ig
Range of R :

py
−1 <= R <= 1

Co
measures the strength and direction of the linear relationship between X
and Y.Doesn’t directly tell the proportion of variance explained by the
ft-
relationship.
ra
D

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 19 / 28
Correlation Coefficient R and R 2 - another formulation

There is another way of showing both R and R 2

ht
(XcT · Yc )

ig
R=p ; Xc = (X − 1 · µx ); Yc = (Y − 1 · µy ) (28)
(XcT · Xc )(Yct · Yc )

r
py
(XcT · Yc )
cov (X , Y ) = ; (29)

Co
n
(X − 1µx )T (X − 1µx ) (Y − 1µy )T (Y − 1µy )
ft-
var (X ) = ; var (Y ) = ;
n n
(30)
ra

2 cov (X , Y )
R =p (31)
D

(var (X ) · var (Y )

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 20 / 28
Correlation Coefficient R 2

R 2 is the squared value of R. Accordingly Range of R 2 :

ht
0 <= R 2 <= 1

ig
. The difference between R and R 2 :

r
py
R (Pearson correlation coefficient):Measures the strength and
direction of the linear relationship between X and Y. Doesn’t directly

Co
tell you the proportion of variance explained by the relationship.
R 2 : Signifies the proportion of variance in Y explained by the linear
ft-
relationship with X.Doesn’t directly tell you the direction (positive or
ra
negative) of the relationship.
While mathematically R 2 is the square of R, their interpretations
D

differ.R focuses on the strength and direction of the association, while

R 2 focuses on the proportion of variance explained.

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 21 / 28
Coefficient of Determination (R-squared) and Adjusted
R-squared

ht
There is another way of defining R 2 as below, however both the

ig
expressions, one shown above and the other shown here are both same.
Coefficient of Determination (R-squared):

r
py
Measures the proportion of the variance in the dependent variable
that is predictable from the independent variables.

Co
In vector notation, it is calculated as:
ft-
SSR
R2 = 1 −
ra
SST
D

where SSR is the sum of squared residuals, and SST is the total sum
of squares.

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 22 / 28
Calculation of SSR and SST in Linear Regression
Sum of Squared Residuals (SSR):
SSR measures the sum of the squared differences between the

ht
predicted and actual values.
In vector notation, it is calculated as:

r ig
SSR = (y − Xθ)T (y − Xθ)

py
Co
where y is the vector of actual values, X is the data matrix, and θ is
the coefficient vector.
ft-
Total Sum of Squares (SST):
SST measures the total sum of squared differences between the actual
ra

values and the mean of the actual values.

In vector notation, it is calculated as:

SST = (y − µy 1)T (y − µy 1)

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 23 / 28
Adjusted R-squared
R² (Coefficient of Determination): Measures the percentage of
variance in the dependent variable explained by the independent
variable. It increases (or stays the same) when more independent

ht
variables are added, even if those variables don’t actually explain any

ig
additional variation. This can lead to overfitting and noise and can

r
increase model complexity and lead to fitting noise.

py
Adjusted R²: adjusts R² to account for the number of independent
variables in the model. It penalizes for adding variables that don’t

Co
improve the model’s fit. Adjusted R² can increase or decrease when
you add variables.
ft-
Adjusted R-squared:
ra

Adjusts R-squared for the number of predictors in the model.

(1 − R 2 )(n − 1)
Adjusted R 2 = 1 −
(n − p)
where n is the number of observations and p is the number of
predictors.
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 24 / 28
R2 and Adjusted R2: Numerical Example

     
1 2 1

ht
2 3 1 2
X = 4 ; Y = 5 ; µx = 4 1, 1, 1, 1 · 4 = 3 (32)

ig
    

5 6 5

r
py
       
2 1 3 −2

Co
1 3 2 3 −1
µy = 1, 1, 1, 1 · 
5 = 4; (Xc = X − µx 1) = 4 − 3 =  1 
      
4
ft-
6 5 3 2
ra
     
2 4 −2
3 4 −1
D

(Yc = Y − µy 1) =  5 − 4 =  1  (33)

    

6 4 2

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 25 / 28
 
−2
−1
(XcT .Yc ) = −2, −1, 1, 2 · 

 1  = 10; (34)


ht
2

ig
(XcT .Xc ) = −2, −1, 1, 2 = 10;

(35)

r
py
 
−2
−1

Co
(YcT .Yc ) = −2, −1, 1, 2 · 

 1  = 10; (36)


2
ft-
(XcT · Yc )
ra

R=p = 10/10 = 1 (37)

(XcT · Xc )(YcT · Yc )
D

R2 = 1 (38)

Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 26 / 28
Steps for Numerical Solution for R, R 2 , and Adjusted R 2
1 Find Means: Calculate µx and µy , the means of X and Y ,
respectively.

ht
2 Center Variables: Obtain centered variables Xc and Yc :

ig
Xc = X − 1 · µx and Yc = Y − 1 · µy

r
py
3 Calculate Pearson Correlation Coefficient (R):

(XcT · Yc )

Co
R=p
(XcT · Xc )(YcT · Yc )
ft-
4 Calculate R 2 :
ra

R2 = R · R
D

5 Calculate Adjusted R 2 :
(1 − R 2 )(n − 1)
Adjusted R 2 = 1 −
n−k −1
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 27 / 28
Multicollinearity in Multiple Linear Regression
Definition
Multicollinearity refers to the phenomenon in which two or more

ht
independent variables in a regression model are highly correlated with each
other.

r ig
Issues Caused by Multicollinearity

py
Unstable estimation of regression coefficients

Co
Inflated standard errors
Difficulty in determining the true relationship between independent
ft-
variables and the dependent variable
ra

Detection of Multicollinearity
D

Mathematically, multicollinearity can be detected using the Variance

Inflation Factor (VIF).
1
VIFi =
1 − Ri2
Instructor : Krishna Dutt, krishnadutt.rvs@gmail.com Data Science - Unit - II March 23, 2024 28 / 28

Business Analytics, 5e Jeffrey D. Camm all chapter instant download
100% (4)
Business Analytics, 5e Jeffrey D. Camm all chapter instant download
55 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Statistics 17 by Keller
No ratings yet
Statistics 17 by Keller
76 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
Module 5
No ratings yet
Module 5
26 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Regression DPP 01 Discussion Notes664745df1b2c900018f5ac7e
No ratings yet
Regression DPP 01 Discussion Notes664745df1b2c900018f5ac7e
32 pages
Lecture 9-10 -Updated vesion S25 - Regression
No ratings yet
Lecture 9-10 -Updated vesion S25 - Regression
43 pages
SolutionQuiz1
No ratings yet
SolutionQuiz1
5 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
MATH3714-Jan-2024 (1)
No ratings yet
MATH3714-Jan-2024 (1)
9 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Chapter 14
No ratings yet
Chapter 14
18 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
DA UNIT-III
No ratings yet
DA UNIT-III
14 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
4 Curve Fitting Least Square Regression and Interpolation
No ratings yet
4 Curve Fitting Least Square Regression and Interpolation
59 pages
Day 1
No ratings yet
Day 1
41 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
34 pages
Linear Regression Course
No ratings yet
Linear Regression Course
22 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Section 2
No ratings yet
Section 2
22 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Curve Fitting
No ratings yet
Curve Fitting
17 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
ml-linear rrgession-1
No ratings yet
ml-linear rrgession-1
14 pages
L4 Emt 2101 Engineering Mathematics Iii
No ratings yet
L4 Emt 2101 Engineering Mathematics Iii
25 pages
Assignment - Week 2 - Final
No ratings yet
Assignment - Week 2 - Final
3 pages
FE5209 3 AY 2024
No ratings yet
FE5209 3 AY 2024
59 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
CH 2
No ratings yet
CH 2
31 pages
Clase 11 Calculo Numerico I
No ratings yet
Clase 11 Calculo Numerico I
37 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
Curve Fitting: There Are Two General Approaches For Curve Fitting
No ratings yet
Curve Fitting: There Are Two General Approaches For Curve Fitting
63 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
MATH3714-Jan-2023 (1)
No ratings yet
MATH3714-Jan-2023 (1)
9 pages
Cu 3008 Assignment 2 - Due Feb 25th
No ratings yet
Cu 3008 Assignment 2 - Due Feb 25th
2 pages
Ch17 Curve Fitting
No ratings yet
Ch17 Curve Fitting
44 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
eng
No ratings yet
eng
10 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
wepik-unveiling-the-intersection-of-technology-and-cybercrime-20240419160557eHWX
No ratings yet
wepik-unveiling-the-intersection-of-technology-and-cybercrime-20240419160557eHWX
13 pages
Template-Major_project_Internal_evaluation-I_presentation[1]
No ratings yet
Template-Major_project_Internal_evaluation-I_presentation[1]
16 pages
Project Seminar on HouseIn Streamlining Construction Collaboration
No ratings yet
Project Seminar on HouseIn Streamlining Construction Collaboration
12 pages
HTML+Introduction
No ratings yet
HTML+Introduction
19 pages
5172Get Probability and Statistics with Reliability Queueing and Computer Science Applications 2nd Edition Kishor Shridharbhai Trivedi free all chapters
100% (21)
5172Get Probability and Statistics with Reliability Queueing and Computer Science Applications 2nd Edition Kishor Shridharbhai Trivedi free all chapters
60 pages
Regression Analysis and ANOVA
No ratings yet
Regression Analysis and ANOVA
2 pages
Efficient Modeling of Distributed Energy Resources
No ratings yet
Efficient Modeling of Distributed Energy Resources
28 pages
Analysis of Variance and Design of Experiments
No ratings yet
Analysis of Variance and Design of Experiments
20 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Quantification Validation And Uncertainty In Analytical Sciences 1st Edition Max Feinberg Serge Rudaz instant download
No ratings yet
Quantification Validation And Uncertainty In Analytical Sciences 1st Edition Max Feinberg Serge Rudaz instant download
55 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Foundation of Data Science - CS3352 - Important Questions With Answer - Unit 3 - Describing Relationships
No ratings yet
Foundation of Data Science - CS3352 - Important Questions With Answer - Unit 3 - Describing Relationships
11 pages
SYIT NEP-Sem-III-IV Syllabus-2024-25. -final
No ratings yet
SYIT NEP-Sem-III-IV Syllabus-2024-25. -final
57 pages
Multivariate Regression
No ratings yet
Multivariate Regression
20 pages
Chapter 11-Simple Linear Regression and Correlation
No ratings yet
Chapter 11-Simple Linear Regression and Correlation
39 pages
Numerical Differentiation of Equally Spaced and Not Equally Space
No ratings yet
Numerical Differentiation of Equally Spaced and Not Equally Space
10 pages
5
No ratings yet
5
23 pages
ch5
No ratings yet
ch5
8 pages
Fuzzy MODEL IDENTIFICATION BASED ON CLUSTER ESTIMATION
No ratings yet
Fuzzy MODEL IDENTIFICATION BASED ON CLUSTER ESTIMATION
12 pages
ASTME74-13a Calibración
No ratings yet
ASTME74-13a Calibración
18 pages
Homework Set No. 8: 1. Simplest Problem Using Least Squares Method
No ratings yet
Homework Set No. 8: 1. Simplest Problem Using Least Squares Method
3 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Applied Numerical Methods With MATLAB For Engineers and Scientists 2nd Edition Steven Chapra Solutions Manual 1
100% (75)
Applied Numerical Methods With MATLAB For Engineers and Scientists 2nd Edition Steven Chapra Solutions Manual 1
34 pages
Index Number
No ratings yet
Index Number
48 pages
System Identification and Adaptive Control
No ratings yet
System Identification and Adaptive Control
2 pages
ASHRAE Symposium AC-02-9-4 Cooling Tower Model-Hydeman PDF
No ratings yet
ASHRAE Symposium AC-02-9-4 Cooling Tower Model-Hydeman PDF
10 pages
BA H DSEC IiApplied Econometrics 5th Sem
No ratings yet
BA H DSEC IiApplied Econometrics 5th Sem
7 pages
Fitting A Logistic Curve To Data
No ratings yet
Fitting A Logistic Curve To Data
12 pages
Least Square Adjustment
100% (1)
Least Square Adjustment
12 pages
MPX2100
No ratings yet
MPX2100
10 pages
Lecture 3 - Machine learning and data driven analysis
No ratings yet
Lecture 3 - Machine learning and data driven analysis
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Science Unit-II

Uploaded by

Data Science Unit-II

Uploaded by

Data Science - Unit - II

Copyright: This beamer is for private targetted circulation only. Content

Matrix Calculus deals with differentiating a scalar and vector functions of

ỹi = θ0 + θ1 .xi ; (4)

The Model, in vector form is represented as

; Square error, ϵ2 , is called Cost function, J(θ), or sometimes called error

between each actual value of dependent variable and model predicted

independent variable from training data set.

(4 · 46 − 12 · 12) 12, 4 40 12, 4

given above is already obtained. The second part is

(where n is the number of observations and p is the number of predictors)

Degrees of freedom (df ) = n − p − 1 (Assuming p = 1) = 48

Significance level (α) = 0.05

The formula for R

There is another way of showing both R and R 2

R 2 is the squared value of R. Accordingly Range of R 2 :

differ.R focuses on the strength and direction of the association, while

values and the mean of the actual values.

In vector notation, it is calculated as:

Adjusts R-squared for the number of predictors in the model.

(Yc = Y − µy 1) =  5 − 4 =  1  (33)

R=p = 10/10 = 1 (37)

Mathematically, multicollinearity can be detected using the Variance

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.