Lab 6
Lab 6
Lab 6
Linear Regression also called linear fitting is a linear model attempts to find the relationship between
two variables by fitting relationship to the observed data. One variable able is explanatory variable
and the other one is called dependent variable.
Equation:
𝑌 = 𝑎 + 𝑏𝑋
Where:
• Y = Explanatory variable
• a = Intercept
• b = Slope of the line
• X = Explanatory Variable
MATLAB can easily determine the slope and y-intercept for the best line through a set of data. Use the
command.
which produces the vector C, in which the first value is the best fit for the slope and the second value
is the vest fit for y-intercept for the least-squares fit of the vector of data Y (on the vertical axis) to the
vector of data X (on the horizontal axis).
For example, if we wanted to find the least-squares regression line for the data above, we could type:
>> x = [2 5 2 4 6];
>> y = [4 7 5 8 11];
>> C = polyfit(x, y, 1);
C = 1.4062 1.6562
6.2.1 Interpolation
Interpolation is a method of deriving a simple function from the given discrete data set such that the
function passes through the provided data points. This helps to determine the data points in between
the given data ones. This method is always needed to compute the value of a function for an
intermediate value of the independent function. In short, interpolation is a process of determining the
unknown values that lie in between the known data points.
Formula:
6.2.2 Extrapolation
Extrapolation is a process of estimating the value beyond the distinct range of the given variable based
on its relationship with another variable. It is an important concept not only in Mathematics but also
in other disciplines like Psychology, Sociology, Statistics, etc., with some categorical data.
To find estimated y-values for x = 3 and x = -1 using the linear regression for the data in above table,
we could type the following into the command window:
yhat1 = 5.8750
yhat2 = 0.2500
As well as plotting data, MATLAB allows us to perform curve fitting to help identify relationships
between data. As we will see, this can be done in two ways: either by calling the built-in functions
polyfit and polyval from the command window, or by using the menu options on a figure window
produced.
To illustrate the use of polyfit and polyval, consider the following example. When diagnosing heart
disease, cardiologists observe and measure the motion of the heart as its beats. Radial myocardial
displacements are measured for the left ventricle of a patient’s heart using a dynamic magnetic
resonance (MR) scan. The data for one segment of the ventricle are contained in a file called
radial.mat, which will be provided to you. This MAT file contains two variables: radial represents the
radial displacement measurements and t represents the time (in milliseconds) of each measurement.
The following code will load the data, plot radial against t, and then fit a cubic polynomial curve to the
data.
>> load('radial.mat');
Note that the polyfit function takes three arguments: the x and y data that we are fitting the curve to,
and the order of the polynomial (3 in this case for a cubic polynomial). The value returned (p) is an
array containing the cubic polynomial coefficients. This is then used as one of the inputs to the polyval
function. polyval computes the value of a polynomial curve at a given value. In the example code, we
compute the value of the fitted cubic polynomial (i.e. the predicted radial displacement) at t = 100
milliseconds.
Alternatively, the same curve fitting can be carried out using the menu options on a figure plot. Try
displaying a plot of t against radial using the code example given in Figure-6(a), and then selecting the
Tools menu on the figure window, followed by the Basic Fitting option. This opens a new window
where we can choose which curve(s) to fit, as well as to predict values using the fitted curve(s) (see
Fig. 6.1b). Try experimenting with this functionality.
Figure 6.1 (a) Showing Radial Myocardial displacement versus time. (b) Basic tool of regression techniques
6.3.1 Example
The code listing shown below is an extended version of a previous experiment’s example (plotting and
fitting a curve to radial displacements of the left ventricle of the heart). We can enter this code into a
MATLAB script m-file, save it with the file name lab6.m and then run the script as described above.
(Note that we do not include the MATLAB command prompt in this listing because we are entering
the code into the editor window not the command window.)
load('radial.mat');
radial_curve=polyval(p, t_curve);
xlabel('Time (ms)')
6.3.2 Example
The code has been extended to compute the difference between the maximum and minimum
displacements using the fitted curve. Note how we use the optional second return value of the max
and min functions to get the array indices of the maximum and minimum displacements. These
represent the end diastole and end systole frames respectively (i.e., at maximum relaxation and
maximum contraction respectively).
6.4 COMMENTS
When writing MATLAB scripts, it is possible to add comments to our code. A comment is a piece of
text that will be ignored by MATLAB when running our script. Comments in MATLAB are specified by
the ‘%’ symbol: any text after a ‘%’ symbol on any line of a script will not be interpreted by MATLAB.
It is good practice to add comments to our code to explain what the code is doing. This is useful if we
or other people need to look at our code in the future with a view to modifying it or fixing errors.
6.4.1 Example
Let’s add some comments to the script we looked at in the last Example to see how it improves code
readability.
% load data
load('radial.mat');
% fit curve
xlabel('Time (ms)')
6.5 Tasks
6.5.1 Under documentation of Statistical tool of Matlab you can find the linear and nonlinear
models. NonlinearModel.fit and LinearModel.fit are functions under linear and non-linear
models respectively.
From Linear Model do the following examples, Paste their code and output here. Also explain
the working of program.
Linear Regression
Code
load carsmall
X = [Weight,Horsepower,Acceleration];
mdl = fitlm(X,MPG)
mdl.Coefficients
anova(mdl,'summary')
Output
Figure 6.2
Figure 6.3
The model display includes the model formula, estimated coefficients, and model summary statistics.
𝑌 = 𝛽 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + ε
The Coefficient property includes these columns:
Estimate — Coefficient estimates for each corresponding term in the model. For example, the
estimate for the constant term (intercept) is 47.977.
tStat — t-statistic for each coefficient to test the null hypothesis that the corresponding coefficient is
zero against the alternative that it is different from zero, given the other predictors in the model. Note
that tStat = Estimate/SE. For example, the t-statistic for the intercept is 47.977/3.8785 = 12.37.
pValue — p-value for the t-statistic of the hypothesis test that the corresponding coefficient is equal
to zero or not. For example, the p-value of the t-statistic for x2 is greater than 0.05, so this term is not
significant at the 5% significance level given the other terms in the model.
Number of observations — Number of rows without any NaN values. For example, Number of
observations is 93 because the MPG data vector has six NaN values and the Horsepower data vector
has one NaN value for a different observation, where the number of rows in X and MPG is 100.
Error degrees of freedom — n – p, where n is the number of observations, and p is the number of
coefficients in the model, including the intercept. For example, the model has four predictors, so the
Error degrees of freedom is 93 – 4 = 89.
Root mean squared error — Square root of the mean squared error, which estimates the standard
deviation of the error distribution.
F-statistic vs. constant model — Test statistic for the F-test on the regression model, which tests
whether the model fits significantly better than a degenerate model consisting of only a constant term.
p-value — p-value for the F-test on the model. For example, the model is significant with a p-value of
7.3816e-27.
The model display also shows the estimated coefficient information, which is stored in the Coefficients
property. Display the Coefficients property.
You can find these statistics in the model properties (NumObservations, DFE, RMSE, and Rsquared)
and by using the anova function.
6.5.2 From Non-Linear Model do the following examples, Paste their code and output here.
Also explain the working of program
Non-Linear Regression
• Non-Linear model from Dataset Array
• Nonlinear Model from Matrix Data
Code
load carbig
X = [Horsepower,Weight];
y = MPG;
b(4)*x(:,2).^b(5);
mdl = fitnlm(X,y,modelfun,beta0)
Xnew = mean(X,'omitnan')
MPGnew = predict(mdl,Xnew)
Output
Fit a nonlinear regression model for auto mileage based on the carbig data. Predict the mileage of an
average car.
Load the sample data. Create a matrix X containing the measurements for the horsepower
(Horsepower) and weight (Weight) of each car. Create a vector y containing the response values in
miles per gallon (MPG).
Figure 6.4
Find the predicted mileage of an average car. Because the sample data contains some missing (NaN)
observations, compute the mean using mean with the 'omitnan' option.
Figure 6.5