0% found this document useful (0 votes)

4 views

Data Science for Civil Engineering Unit 2 Notes

The document covers essential mathematical concepts relevant to data science in civil engineering, including calculus, linear algebra, least squares regression, eigenvalues and eigenvectors, symmetric matrices, singular value decomposition, and principal component analysis. It explains the principles and applications of these topics, emphasizing their importance in various fields such as physics, engineering, and data analysis. The document serves as a foundational guide for understanding how these mathematical techniques can be applied in practical scenarios.

Uploaded by

Parag Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Data Science for Civil Engineering Unit 2 Notes

Uploaded by

Parag Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Data Science for Civil Engineering

Unit 2: Applied Mathematics Calculus, Linear Algebra - Least Square Regression, Eigen Values and
Vectors, Symmetric Matrices, Single Value Decomposition, Principal Component Analysis Applied
Statistics Frequency Distribution, Permutations and Combinations, Probability, Random Variables,
(Discrete, Continuous, Poisson and Binomial) (Single and Multiple) and its applications, Expectation and
Variance, Hypothesis Testing
Applied Mathematics Calculus, Linear Algebra

Calculus:

Calculus is a branch of mathematics that focuses on the study of continuous change. It provides tools and methods for
understanding how quantities change with respect to one another. Calculus is divided into two main branches:
differential calculus and integral calculus.
1. Differential Calculus: Differential calculus deals with the concept of a derivative, which represents the rate
of change of a function. Key concepts include:

 Derivative: The derivative of a function at a point represents the instantaneous rate of change of
the function at that point.
 Differentiation Rules: Rules for finding derivatives of different types of functions, such as power,
exponential, trigonometric, and composite functions.
 Applications: Derivatives have various applications, including optimization problems, curve
sketching, and physics (kinematics).
2. Integral Calculus: Integral calculus focuses on the concept of an integral, which represents the
accumulation of quantities. They concepts include:
 Integral: The integral of a function represents the area under the curve of the function over a
specified interval.
 Definite Integral:Representsthe accumulated quantity over a specific interval.

 Indeﬁnite Integral (Antiderivative): Represents the family of functions whose derivative is

the original function.
 Integration Techniques: Techniques like substitution, integration by parts, and partial fractions
for evaluating integrals.
 Applications: Integrals are used in calculating areas, volumes, work, and solving diﬀerential
equations.

© Prof. Prashant H. Kamble Page 1

Linear Algebra:

Linear algebra is a branch of mathematics that deals with vector spaces and linear equations. It provides a
framework for representing and solving systems of linear equations, and it has applications in various ﬁelds,
including computer graphics, machine learning, physics, and engineering.
1. Vectors and Vector Spaces:
 Vectors: Vectors represent quantities that have both magnitude and direction. They can be
represented as column matrices or ordered lists of numbers.
 Vector Operations: Addition, scalar multiplication, and dot product are fundamental vector
operations.
 Vector Spaces: Vector spaces are sets of vectors closed under addition and scalar multiplication.
They satisfy speciﬁc axioms.
2. Matrices and Linear Transformations:

 Matrices: Matrices are rectangular arrays of numbers. They are used to represent linear
transformations and solve systems of linear equations.
 Matrix Operations: Addition, scalar multiplication, matrix multiplication, and inverse operations
on matrices.
 Linear Transformations: Matrices can represent transformations that preserve linear
relationships, such as rotations, reﬂections, and scaling.
3. Eigen values and Eigen vectors:
 Eigen values: Eigen values of a matrix represent scalar factors by which certain vectors are
scaled during a linear transformation.
 Eigenvectors: Eigenvectors are non-zero vectors that remain in the same direction after a linear
transformation.
Applications:

 Computer Graphics: Linear algebra is used to manipulate and render 2D and 3D graphics.
 Machine Learning: Matrices and vectors are essential for representing data and mathematical
operations in machine learning algorithms.
 Physics and Engineering: Linear algebra is used in solving systems of equations, modeling
physical systems, and more.
Understanding calculus and linear algebra is crucial for many ﬁelds of science, engineering, and mathematics.
These concepts provide the foundation for more advanced mathematical theories and practical applications.
===================================================================

© Prof. Prashant H. Kamble Page 2

Least Square Regression

Least Squares Regression, commonly known as Linear Regression, is a statistical method used to model the
relationship between a dependent variable (also called the target or response variable) and one or more independent
variables (also called predictors or features). The goal of linear regression is to find the best-fitting straight line (or
hyperplane in higher dimensions) that minimizes the sum of the squared differences between the observed and
predicted values.
Simple Linear Regression:
In simple linear regression, there is only one independent variable. The relationship between the independent
variable "X" and the dependent variable "y" is represented by a linear equation:
code

y = mx + b

Where:

 "y" is the dependent variable.

 "x" is the independent variable.

 "m" is the slope of the line.

 "b" is the y-intercept.

The objective is to ﬁnd the values of "m" and "b" that best ﬁt the data points. he least squares method aims to
minimize the sum of the squared vertical distances between the observed data points and the predicted values on the
regression line.
Multiple Linear Regression:

In multiple linear regression, there are multiple independent variables. he relationship between the dependent
variable "y" and multiple independent variables "x1", "x2", ..., "xn" is represented by the equation:
code

y = b0 + b1x1 + b2x2 + ... + bn*xn

Where:

 "y" is the dependent variable.

 "x1", "x2", ..., "xn" are the independent variables.

 "b0" is the intercept.

 "b1", "b2", ..., "bn" are the coeﬃcients for the respective independent variables.

© Prof. Prashant H. Kamble Page 3

The Least Squares Method:

The goal is to find the coefficients that minimize the sum of the squared differences between the observed "y"
values and the predicted values (given by the linear equation). This is achieved by calculating the squared residuals
and summing them up:

code

minimize: Σ(y_observed - y_predicted)^2

The coeﬃcients are found using calculus, and the resulting equations provide the optimal values for the slope(s) and
intercept that minimize the sum of squared residuals.
Application:

Linear regression is widely used in various ﬁelds, including economics, social sciences, engineering, and data
science, for tasks such as:

 Predictive modeling: Predicting a continuous output based on input features.

 Trend analysis: Analyzing trends and relationships between variables.

 Hypothesis testing: Evaluating the impact of an independent variable on the dependent variable.
 Feature selection: Identifying the most important predictors in a dataset.

Assumptions and Considerations:

 Linearity: The relationship between the variables is assumed to be linear.

 Independence: The observations are assumed to be independent.

 Homoscedasticity: The variance of the residuals is constant across all levels of the independent
variable(s).

 Normality: The residuals are assumed to be normally distributed.

Linear regression is a foundational technique in statistics and data analysis. While it provides a simple and
interpretable model, it's important to ensure that the assumptions are met and to consider more complex techniques
for cases where the assumptions are violated.
===================================================================

© Prof. Prashant H. Kamble Page 4

Eigen Values and Vectors

Eigen values and eigenvectors are fundamental concepts in linear algebra that have wide-ranging applications in
various ﬁelds, including mathematics, physics, engineering, computer graphics, and machine learning. They are
associated with square matrices and play a signiﬁcant role in understanding transformations and behaviors
within these matrices.
Eigen values:

Eigenvalues are scalar values that represent how a matrix scales a vector during a linear transformation. They
provide information about the stretching or compressing eﬀect of the transformation along certain directions.

Given a square matrix "A" and a nonzero vector "v," an eigen value "λ" and an eigenvector "v" satisfy the equation:

A*v=λ*v

In this equation:

 "A" is the square matrix.

 "v" is the eigenvector.

 "λ" is the eigenvalue associated with the eigenvector "v."

Eigen vectors:

Eigenvectors are non-zero vectors that remain in the same direction (up to scaling) when transformed by a matrix.
They represent the directions along which a matrix behaves in a particularly simple way.
Eigenvectors are determined by solving the system of linear equations formed by the eigenvalue equation:

(A - λI) * v = 0 In
this equation:
 "A" is the square matrix.

 "λ" is the eigenvalue being considered.

 "I" is the identity matrix.

 "v" is the eigenvector.

Properties and Applications:

1. Eigen values as Scaling Factors:

 Eigen values indicate how much an eigenvector is scaled or compressed during a linear transformation.
A positive eigen value implies stretching, a negative eigen value implies ﬂipping, and a zero eigen
value implies collapsing to a lower- dimensional subspace.

© Prof. Prashant H. Kamble Page 5

2. Diagonalization:

o Diagonalization involves expressing a matrix as a product of three matrices: P (eigenvectors),

D (diagonal matrix of eigen values), and P^(-1).

o A matrix is diagonalizable if it has a full set of linearly independent eigenvectors.

3. Matrix Powers and Exponentials:

 Matrix powers and exponentials can be computed using eigen decomposition.

4. Principal Component Analysis (PCA):

 PCA is a dimensionality reduction technique that uses eigenvectors to ﬁnd the directions of
maximum variance in a dataset.
5. Quantum Mechanics:

 In quantum mechanics, eigenvalues and eigenvectors are used to represent states and observables.
Calculation of Eigen values and Eigenvectors:

Finding eigen values involves solving the characteristic equation of the matrix "A - λI" and then substituting the
eigen values into the eigen value equation to ﬁnd the corresponding eigenvectors.
Example: Consider a 2x2 matrix:

A=|31||13|

The eigen values are 4 and 2. The corresponding eigenvectors are [1, 1] and [-1, 1].

Eigen values and eigenvectors oﬀer deep insights into the behavior of linear transformations represented by
matrices. They are essential tools for understanding the properties of matrices and their applications in various
ﬁelds.
===================================================================

Symmetric Matrices

A symmetric matrix is a square matrix that is equal to its transpose. In other words, a matrix "A" is symmetric if it
is equal to its own transpose "A^T":

A = A^T

Symmetric matrices have special properties and characteristics that make them important in various areas of
mathematics and applications. Let's explore the details of symmetric matrices:
Properties of Symmetric Matrices:

1. Diagonal Elements: The diagonal elements of a symmetric matrix remain the same as they are in the
original matrix. That is, "A[i][i] = A^T[i][i]" for all "i" (where "i" denotes the row and column index).

© Prof. Prashant H. Kamble Page 6

2. Off-Diagonal Elements: The elements above the main diagonal (i < j) in a symmetric matrix are the same
as the elements below the main diagonal (i > j), due to the symmetry property.
3. Eigen values and Eigenvectors: Symmetric matrices have real eigen values, and their eigenvectors are
orthogonal (perpendicular) to each other. This property is crucial in various applications, including
diagonalization and principal component analysis (PCA).
4. Positive Definiteness: Symmetric positive definite matrices have all positive eigenvalues, and they play a
significant role in optimization, statistics, and linear algebra.

5. Real Spectral Theorem: Every real symmetric matrix can be diagonalized using orthogonal matrices,
resulting in a diagonal matrix containing the eigen values.
Applications and Signiﬁcance:

1. Physics and Engineering: Symmetric matrices appear in physics equations, such as moment of inertia
tensors, stress tensors, and covariance matrices.
2. Eigen value Problems: Symmetric matrices are well-suited for solving eigenvalue problems, which have
applications in quantum mechanics, vibrational analysis, and more.
3. Graph Theory: The adjacency matrix of an undirected graph is a symmetric matrix. It contains
information about connections between vertices.
4. Statistical Covariance: Covariance matrices in statistics are often symmetric. They represent the
relationships between different variables in a dataset.
5. Positive Definite Matrices: Symmetric positive definite matrices are used in optimization problems and as
covariance matrices in multivariate normal distributions.
Example:

Consider the following symmetric matrix:

A=|314||152||426|

This matrix is symmetric because "A" is equal to its transpose "A^T."

Symmetric matrices have numerous applications in mathematics, science, and engineering due to their properties
and special characteristics. Their symmetry simpliﬁes calculations, diagonalization, and various analyses, making
them an essential concept in linear algebra.
===================================================================

© Prof. Prashant H. Kamble Page 7

Single Value Decomposition

Single Value Decomposition (SVD) is a powerful matrix factorization technique used in linear algebra and data
analysis. It decomposes a matrix into three separate matrices, enabling eﬃcient manipulation and analysis of data.
SVD has applications in various ﬁelds, including signal processing, image compression, machine learning, and
recommendation systems.
SVD Decomposition:

Given a matrix "A" of size m x n, SVD decomposes it into three matrices:

A = UΣV^T

Where:

 "U" (m x m) is the left singular matrix.

 "Σ" (m x n) is the diagonal singular value matrix.

 "V^T" (n x n) is the transpose of the right singular matrix.

Properties and Interpretation:

1. Singular Values: The singular values in the diagonal matrix "Σ" are non-negative and represent the
scaling factors associated with the transformation represented by "A."
2. Orthogonality: The matrices "U" and "V" are orthogonal matrices, meaning their columns are mutually
orthogonal unit vectors. They play a role in transforming and rotating the data.
3. Dimension Reduction: SVD can be used for dimensionality reduction by selecting a subset of the singular
values and corresponding singular vectors.
4. Data Compression: In applications like image compression, keeping only a subset of singular values and
their associated vectors can lead to signiﬁcant data compression with minimal loss of information.
SVD Applications:

1. Principal Component Analysis (PCA): SVD is used in PCA to find the orthogonal axes (principal
components) along which the data varies the most.
2. Image Compression: SVD can be used for image compression by keeping a reduced number of singular
values and vectors, thus reducing the storage space required.
3. Collaborative Filtering: In recommendation systems, SVD can be applied to decompose a user-item
interaction matrix and find latent features that represent user preferences and item characteristics.
4. Data Imputation: SVD can be used to fill missing entries in a data matrix, providing a way to impute
missing values.
5. Solving Linear Equations: SVD can be used to solve systems of linear equations and compute the
pseudo-inverse of a matrix.

© Prof. Prashant H. Kamble Page 8

Calculating SVD:

The calculation of SVD involves numerical methods, such as the Golub-Reinsch algorithm. Software
libraries like NumPy (Python) and MATLAB provide built-in functions for computing SVD.
Example:

Consider the following matrix "A":

A = | 4 11 |

|87|

Using SVD, we can decompose "A" into "UΣV^T." The resulting matrices "U," "Σ," and "V^T" would provide
insight into the relationships and properties of the original matrix.
Single Value Decomposition is a versatile tool that offers insights into the structure of matrices and has practical
applications in various fields. It allows for efficient manipulation of data and dimensionality reduction, making it a
key technique in modern data analysis.
===================================================================

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique used in data analysis and machine
learning to transform high-dimensional data into a lower-dimensional representation while retaining as much
variance as possible. PCA identiﬁes the directions (principal components) along which the data varies the most,
allowing for the reduction of noise and redundancy in the dataset.
Key Objectives of PCA:

1. Dimensionality Reduction: Reduce the number of features (dimensions) in the data while preserving the
most important information.
2. Variance Maximization: Find the directions in which the data has the highest variance. These directions
capture the most signiﬁcant patterns in the data.
3. Decorrelation: The principal components are orthogonal (uncorrelated) to each other, which removes any
redundant information.
PCA Process:

1. Data Standardization: Standardize the data to have zero mean and unit variance across each feature. This
step is crucial to ensure that features with diﬀerent scales do not dominate the analysis.

2. Covariance Matrix: Calculate the covariance matrix of the standardized data. The covariance matrix
provides information about the relationships between features.

3. Eigen value Decomposition: Perform eigen value decomposition on the covariance matrix. This yields the

© Prof. Prashant H. Kamble Page 9

eigenvalues and eigenvectors of the matrix.
4. Sorting Eigen values: Sort the eigen values in descending order. The eigenvectors corresponding to
the largest eigen values are the principal components.
5. Selecting Principal Components: Choose the top "k" eigenvectors (principal components) to retain.
Typically, "k" is selected to capture a certain percentage of the total variance (e.g., 95%).
6. Projection: Project the original data onto the selected principal components to create the reduced-
dimensional dataset.
Applications of PCA:

1. Dimensionality Reduction: PCA is used to reduce the number of features in high- dimensional datasets.
This reduces computational complexity and can improve the performance of machine learning
algorithms.
2. Noise Reduction: PCA can remove noise and small variations in the data, emphasizing the most
signiﬁcant patterns.
3. Data Visualization: PCA can be used to visualize high-dimensional data in lower dimensions. It helps
reveal underlying structures and relationships.
4. Feature Extraction: In image processing, PCA can be used to extract important features from images.
5. Data Compression: PCA can be applied to compress data, reducing storage requirements while retaining
essential information.
Example:

Consider a dataset with multiple features representing different measurements of objects. PCA would identify the
directions along which the data varies the most. hese directions are the principal components, and projecting the
data onto these components would create a reduced-dimensional representation that captures the most important
information.
PCA is a widely used technique in various fields, including image analysis, signal processing, finance, and more. It
allows for efficient data representation, visualization, and noise reduction, making it an essential tool in exploratory
data analysis and preprocessing.
===================================================================

Applied Statistics Frequency Distribution

Frequency distribution is a fundamental concept in applied statistics that provides a way to organize and summarize
data into distinct categories or intervals along with the corresponding frequencies or counts. It helps in
understanding the distribution of data and identifying patterns, trends, and important characteristics. Frequency
distributions are commonly used in various ﬁelds, such as economics, social sciences, natural sciences, and data
analysis.

© Prof. Prashant H. Kamble Page 10

Key Components of Frequency Distribution:

1. Classes or Intervals: The range of data values is divided into non-overlapping intervals or classes. Each
interval represents a range of values that data points can fall into.
2. Frequency: The frequency of a class refers to the number of data points that fall within that interval. It
represents how many times a speciﬁc range of values occurs in the dataset.
3. Relative Frequency: Relative frequency is the ratio of the frequency of a class to the total number of data
points. It provides a proportion or percentage representation of each class's frequency relative to the entire
dataset.
4. Cumulative Frequency: Cumulative frequency is the running total of frequencies as you move through
the classes. It helps in understanding how many data points are less than or equal to a certain value.
Steps to Create a Frequency Distribution:
1. Determine the Number of Intervals: Decide on the number of intervals (classes) you want to divide the
data into. Common methods for selecting the number of intervals include the square root rule, Sturges'
formula, and the Freedman-Diaconis rule.
2. Determine Interval Width: Calculate the interval width by dividing the range of data

values by the number of intervals. Round up to a convenient value.

3. Create Intervals: Construct intervals based on the interval width. Make sure the intervals are non-
overlapping and cover the entire range of data values.
4. Count Frequencies: Count the number of data points that fall into each interval. his gives you the
frequency for each class.
5. Calculate Relative and Cumulative Frequencies: Calculate the relative frequency by dividing the
frequency of each class by the total number of data points. Also, calculate the cumulative frequency by
adding up the frequencies as you move through the classes.
Example:

Suppose you have a dataset of exam scores:

82, 68, 92, 75, 88, 58, 92, 78, 85, 72, 90, 82, 95

Let's create a frequency distribution with 5 intervals:

1. Determine the range: Range = 95 - 58 = 37

2. Determine interval width: Interval width ≈ 37 / 5 = 7.4 (round to 8)

3. Intervals: [58 -65), [66-73), [74-81), [82-89), [90-97]

4. Frequencies: 1, 2, 3, 4, 2

5. Relative Frequencies: 1/13, 2/13, 3/13, 4/13, 2/13

© Prof. Prashant H. Kamble Page 11

6. Cumulative Frequencies: 1, 3, 6, 10, 12
Applications:
1. Histograms: Frequency distributions are often used to create histograms, which provide visual
representations of data distribution.
2. Data Analysis: Frequency distributions help in summarizing and understanding large datasets, identifying
outliers, and detecting patterns.
3. Statistical Inference: Frequency distributions are essential for making inferences about populations based
on sample data.
4. Descriptive Statistics: They provide insights into central tendencies, variability, and characteristics of
data.
Frequency distributions are a crucial tool for summarizing and analyzing data in a structured manner. They help in
simplifying complex datasets and revealing important information about the distribution and patterns within the
data.
===================================================================

Permutations and Combinations

Permutations and combinations are fundamental concepts in combinatorics, which is a branch of mathematics
dealing with counting and arranging objects. Both permutations and combinations are used to calculate the number
of ways you can select or arrange items from a set. However, they have distinct applications and rules.
Permutations:

Permutations refer to the arrangement of items in a speciﬁc order. It answers the question, "How many ways can we
arrange 'r' items from a set of 'n' items?"
Formula for Permutations: The number of permutations of 'r' items from a set of 'n' items is given by:

nPr = n! / (n - r)!

Where:

 "n" is the total number of items in the set.

 "r" is the number of items being arranged.

 "n!" is the factorial of "n," which is the product of all positive integers from 1 to "n."

Example of Permutations: If you have 5 diﬀerent books and you want to arrange 3 of them on a shelf, the number of
permutations is 5P3 = 5! / (5 - 3)! = 60.
Combinations:

Combinations refer to the selection of items without considering their order. It answers the question, "How many
ways can we choose 'r' items from a set of 'n' items?"
© Prof. Prashant H. Kamble Page 12
Formula for Combinations: The number of combinations of 'r' items from a set of 'n' items is given by:

nCr = n! / (r! * (n - r)!)

Where:

 "n" is the total number of items in the set.

 "r" is the number of items being chosen.

 "n!" is the factorial of "n."

 "r!" is the factorial of "r."

 "(n - r)!" is the factorial of the diﬀerence between "n" and "r."

Example of Combinations: If you have 10 diﬀerent candies and you want to choose 4 of them to eat, the number of
combinations is 10C4 = 10! / (4! * (10 - 4)!) = 210.
Applications:

 Permutations: Permutations are used when order matters, such as arranging seats, selecting winners in a
race, or forming passwords.
 Combinations: Combinations are used when order does not matter, such as selecting a committee,
choosing items to buy from a menu, or picking a lottery ticket.
Both permutations and combinations are essential concepts in combinationss and have wide-ranging applications in
various ﬁelds, including probability theory, statistics, computer science, and cryptography. They provide
fundamental tools for solving counting problems and analyzing possibilities.
===================================================================

Probability

Probability is a fundamental concept in mathematics and statistics that quantiﬁes the likelihood or chance of an
event occurring. It provides a formal framework for reasoning about uncertainty and randomness. Probability
theory is widely used in various ﬁelds, including statistics, economics, physics, engineering, and social sciences.
Key Concepts in Probability:

1. Sample Space (S): The sample space is the set of all possible outcomes of an experiment. For example,
when rolling a six-sided die, the sample space is {1, 2, 3, 4, 5, 6}.
2. Event: An event is a subset of the sample space, representing a particular outcome or a collection of
outcomes. Events can be simple (single outcome) or compound (multiple outcomes).
3. Probability of an Event (P): The probability of an event is a numerical value between 0 and 1 that
indicates the likelihood of the event occurring. A probability of 0 means the event is impossible, and a
probability of 1 means the event is certain.

© Prof. Prashant H. Kamble Page 13

4. Probability Distribution: A probability distribution assigns probabilities to each possible outcome of a
random variable. It shows how likely each outcome is.

5. Random Variable: A random variable is a variable that can take on diﬀerent values based on the
outcomes of a random process. It can be discrete (taking on distinct values) or continuous (taking on a
range of values).

6. Probability Density Function (PDF): For continuous random variables, the probability density function
gives the relative likelihood of a random variable falling within a speciﬁc range.
Calculating Probability:

The probability of an event "A" is calculated as the ratio of the number of favorable outcomes to the total number of
possible outcomes:
P(A) = (Number of Favorable Outcomes) / (Total Number of Possible Outcomes) Basic Rules of Probability:
1. Addition Rule: The probability of the union of two events A and B is given by: P(A ∪ B) =
P(A) + P(B) - P(A ∩ B)
2. Multiplication Rule: The probability of the intersection of two independent events A and B is given by:

P(A ∩ B) = P(A) * P(B)

3. Complementary Rule: The probability of the complement of an event A is given by:

P(A') = 1 - P(A)
Conditional Probability:

Conditional probability refers to the probability of an event occurring given that another event has occurred. It is
denoted as P(A|B), which represents the probability of event A occurring given that event B has occurred.
P(A|B) = P(A ∩ B) / P(B)

Bayes' Theorem:

Bayes' Theorem is a fundamental rule in probability that relates conditional probabilities. It is used to update
beliefs about an event based on new evidence.
P(A|B) = (P(B|A) * P(A)) / P(B)

Applications:

 Statistics: Probability is the foundation of statistical analysis, hypothesis testing, and sampling.
 Risk Analysis: Probability is used to assess risks and make informed decisions.

 Random Processes: Probability is crucial in understanding random processes like gambling, genetics, and
quantum mechanics.
 Machine Learning: Probability theory is used in various machine learning algorithms, including Bayesian

methods and probabilistic graphical models.
Probability theory allows us to reason about uncertainty and make informed decisions based on available
information. It plays a crucial role in a wide range of ﬁelds and is an essential tool for modeling and analyzing real-
world phenomena.

===================================================================

Random Variables

A random variable is a key concept in probability theory and statistics that represents a numerical outcome of a
random process. It quantiﬁes the uncertainty associated with an experiment or event by assigning numerical values
to the diﬀerent outcomes that could occur. Random variables play a central role in probability distributions,
statistical analysis, and data modeling.
Types of Random Variables:

1. Discrete Random Variable: A discrete random variable takes on a countable set of distinct values. hese
values are often represented as integers, and they can be listed or enumerated. Examples include the
number of heads obtained when ﬂipping a coin multiple times or the number rolled on a fair die.
2. Continuous Random Variable: A continuous random variable can take on any value within a continuous
range. These values are typically real numbers and are not countable individually. Examples include
measurements like height, weight, and time intervals.
Characteristics of Random Variables:

1. Probability Distribution: The probability distribution of a random variable speciﬁes the likelihood of
each possible value occurring. For discrete random variables, this distribution is often given as a
probability mass function (PMF). For continuous random variables, it's given as a probability density
function (PDF).
2. Cumulative Distribution Function (CDF): The cumulative distribution function gives the probability that
a random variable takes on a value less than or equal to a given value.

3. Expected Value (Mean): The expected value of a random variable is the average value it would take over
many trials. It is often denoted as E(X) or μ.
4. Variance and Standard Deviation: The variance measures the spread or variability of the random
variable's values around its mean. The standard deviation is the square root of the variance.
5. Moments: Higher moments provide additional information about the shape and characteristics of the
probability distribution.

Examples:

1. Discrete Random Variable: Consider rolling a fair six-sided die. Let X represent the outcome of the roll.
The random variable X can take on values {1, 2, 3, 4, 5, 6}, and its probability distribution is given by P(X

= x) = 1/6 for each value x.
2. Continuous Random Variable: Consider the height of individuals. Let Y represent a person's height in
inches. The random variable Y can take on any real value within a continuous range, and its probability
distribution is represented by a continuous curve (PDF).
Applications:

 Statistics: Random variables are used to model uncertainty and analyze data in statistics and probability
theory.
 Probability Distributions: They are essential for understanding how outcomes are distributed and for
making predictions based on probability.
 Hypothesis Testing: Random variables play a crucial role in hypothesis testing and determining the
significance of results.
 Decision Making: Random variables help in making informed decisions under uncertainty, such as in risk
analysis.
Random variables provide a formal framework for describing and analyzing the variability and uncertainty
associated with random processes. They are a cornerstone of probability theory and have broad applications in
various fields, including finance, engineering, natural sciences, and social sciences
===================================================================

(Discrete, Continuous, Poisson and Binomial) Discrete

Variable:
A discrete variable is a type of random variable that can take on a countable number of distinct values. hese values
are usually integers and are often the result of counting or enumerating something. Examples of discrete variables
include the number of heads obtained in multiple coin ﬂips, the count of cars passing through a toll booth in an
hour, or the number of children in a family.
Continuous Variable:
A continuous variable is a type of random variable that can take on any value within a continuous range. hese
values are typically real numbers and are not countable individually. Continuous variables are often associated with
measurements and can take on an inﬁnite number of possible values. Examples include height, weight, time,
temperature, and distance.
Poisson Variable:

The Poisson distribution is a probability distribution that describes the number of events that occur in a ﬁxed
interval of time or space. It's commonly used when events happen randomly and independently over time or space.
The Poisson distribution is characterized by a single parameter, λ (lambda), which represents the average rate of
events in the interval. he probability mass function (PMF) of a Poisson distribution is given by:
P(X = k) = (e^(-λ) * λ^k) / k!
© Prof. Prashant H. Kamble Page 16
Where:
 X is the random variable representing the number of events.

 k is the number of events that occurred.

 e is the base of the natural logarithm (approximately 2.71828).

Binomial Variable:

The binomial distribution is a probability distribution that models the number of successes in a ﬁxed number of
independent Bernoulli trials. A Bernoulli trial is an experiment with two possible outcomes: success or failure. The
binomial distribution is characterized by two parameters: the number of trials (n) and the probability of success (p)
in each trial. The probability mass function (PMF) of a binomial distribution is given by:
P(X = k) = (n choose k) * p^k * (1 - p)^(n - k) Where:
 X is the random variable representing the number of successes.

 k is the number of successes.

 (n choose k) is the binomial coeﬃcient, representing the number of ways to choose k successes out of n
trials.

===================================================================

(Single and Multiple) and its applications

Single Variable:
A single variable refers to a single quantity or characteristic that is being measured or observed. In statistics, a
single variable is often called a univariate variable. It represents data points that are associated with one particular
aspect of interest. This aspect can be a measurement, a count, a category, or any other value that can be recorded.
Applications of Single Variable Analysis:

1. Descriptive Statistics: Analyzing a single variable helps in calculating descriptive statistics such as mean,
median, mode, variance, and standard deviation. hese statistics provide insights into the central tendency,
spread, and distribution of the data.
2. Data Visualization: Single variable analysis often involves creating graphical representations such as
histograms, box plots, and bar charts. These visuals help in understanding the distribution and
characteristics of the data.
3. Quality Control: In manufacturing and quality control, analyzing a single variable can help monitor the
consistency and performance of a product or process.

4. Market Research: Analyzing single variables can provide insights into consumer preferences, buying
behaviors, and demographic distributions.

Multiple Variables:

Multiple variables refer to the study of relationships between two or more variables. In statistics, this is often
referred to as multivariate analysis. When analyzing multiple variables, researchers aim to understand how changes
in one variable are related to changes in another variable.
Applications of Multiple Variable Analysis:
1. Correlation Analysis: Analyzing multiple variables can help determine the strength and direction of
relationships between them. Correlation coeﬃcients quantify how two variables move together.
2. Regression Analysis: This involves predicting the value of one variable (dependent variable) based
on the values of one or more other variables (independent variables). Linear and nonlinear regression
models are common examples.
3. Factor Analysis: Factor analysis is used to identify underlying factors or dimensions that explain the
correlations between observed variables. It's commonly used in psychology and social sciences.
4. Cluster Analysis: Cluster analysis groups similar observations into clusters based on their characteristics.
It's used for segmenting customers, grouping similar products, and more.

5. Principal Component Analysis (PCA): PCA reduces the dimensionality of data while retaining as much
variation as possible. It's used for feature extraction and dimensionality reduction.
Single variable analysis focuses on understanding the distribution and characteristics of a single quantity, while
multiple variable analysis explores relationships and patterns between two or more variables. Both types of analysis
play crucial roles in statistical research, decision-making, and gaining insights from data across various ﬁelds,
including science, business, social sciences, and more.
===================================================================

Expectation and Variance

Expectation and variance are fundamental concepts in probability theory and statistics that provide insights into the
behavior and characteristics of random variables. They are used to describe the central tendency and spread of a
probability distribution, respectively. Let's delve into each of these concepts in detail:
Expectation (Mean):

The expectation of a random variable is a measure of its average value. It represents the center of mass or the
balance point of the distribution. For a discrete random variable X with probability mass function P(X), the
expectation (E[X]) is calculated as the sum of the product of each possible value and its corresponding probability:
E[X] = ∑ (x * P(X = x))

For a continuous random variable Y with probability density function f(Y), the expectation (E[Y]) is calculated as
the integral of the product of each possible value and its corresponding density function:

E[Y] = ∫ (y * f(Y = y)) dy
Variance:

The variance of a random variable measures the dispersion or spread of its values around the mean. It indicates how
much the values deviate from the mean value. The variance (Var[X]) of a random variable X is calculated as the
expected value of the squared diﬀerence between each value and the mean:
Var[X] = E[(X - E[X])^2]

In other words, variance is the average of the squared deviations from the mean. The square root of the variance is
known as the standard deviation (σ), which gives a measure of the spread in the original units of the data.
Properties:

1. The expectation of a constant "c" is the constant itself: E[c] = c.

2. Linearity of Expectation: E[aX + b] = aE[X] + b, where "a" and "b" are constants.

3. Variance is not linear: Var[aX + b] = a^2 * Var[X].

Applications:

 Expectation: Expectation is used to determine the average outcome of a random variable, making it useful
in decision-making, risk assessment, and modeling. In ﬁnance, it's used in expected value calculations for
investments and insurance.
 Variance: Variance measures the uncertainty or variability of data. In risk assessment, a low variance
indicates more stable outcomes, while high variance indicates greater volatility. It's used in quality control to
assess the consistency of processes and products.
Example:

Suppose you have a fair six-sided die. The possible outcomes (1, 2, 3, 4, 5, 6) have equal probabilities.
The expectation of the die roll is (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5. The variance can be calculated by ﬁnding the
squared diﬀerences between each outcome and the mean, then taking the average.
Expectation and variance are fundamental concepts that provide insights into the central tendencies and dispersion
of random variables. They play crucial roles in statistical analysis, decision-making, and understanding the behavior
of data.
===================================================================

Hypothesis Testing

Hypothesis testing is a critical component of statistical inference that helps us make decisions and draw conclusions
about population parameters based on sample data. It involves the formulation of hypotheses, statistical analysis of
sample data, and determining whether the observed results are consistent with the hypothesized assumptions.
Hypothesis testing is widely used in research, scientiﬁc investigations, and decision-making processes.

Key Concepts in Hypothesis Testing:
1. Null Hypothesis (H0): The null hypothesis is a statement that there is no significant difference or effect.
It represents the status quo or default assumption. For example, in a medical trial, the null hypothesis might
be that a new drug has no effect.
2. Alternative Hypothesis (Ha): The alternative hypothesis is the statement we are trying to find evidence for.
It suggests that there is a significant difference or effect. Using the same medical trial example, the
alternative hypothesis might be that the new drug is effective.
3. Test Statistic: The test statistic is a calculated value that measures the difference between the sample data
and what is expected under the null hypothesis.

4. Significance Level (α): The significance level is a threshold that determines how much evidence is
required to reject the null hypothesis. Common significance levels include 0.05 and 0.01.
5. P-Value: The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the
one calculated from the sample data, assuming that the null hypothesis is true.
6. Decision Rule: Based on the p-value and significance level, a decision rule is established to either reject the
null hypothesis or fail to reject it.

Steps in Hypothesis Testing:

1. Formulate Hypotheses: State the null hypothesis (H0) and the alternative hypothesis (Ha).

2. Select Significance Level: Choose a significance level (α) to determine the threshold for evidence.
3. Collect and Analyze Data: Collect sample data and perform appropriate statistical tests to calculate the
test statistic.
4. Calculate P-Value: Calculate the p-value based on the test statistic and the assumed distribution.
5. Compare P-Value and α: If the p-value is less than or equal to the significance level (α), reject the null
hypothesis. Otherwise, fail to reject the null hypothesis.
6. Draw Conclusion: Based on the comparison, make a conclusion about the null hypothesis. If rejected,
accept the alternative hypothesis.
Types of Hypothesis Tests:

 One-Sample Tests: Compare a sample to a known or hypothesized value.

 Two-Sample Tests: Compare two independent samples.

 Paired Tests: Compare paired or matched data points within the same sample.

 Chi-Square Tests: Used for categorical data to assess the association between variables.

 ANOVA (Analysis of Variance): Used to compare means of multiple groups.

 Regression Analysis: Used to test relationships between variables.

Applications:
Hypothesis testing is used in various ﬁelds, including:

 Medical Research: Testing the eﬀectiveness of treatments.

 Quality Control: Testing whether a process meets certain standards.

 Economics: Testing the impact of policies or changes.

 Marketing: Testing the eﬀectiveness of advertising campaigns.

 Environmental Science: Testing the impact of pollutants.

Hypothesis testing allows us to make informed decisions based on data and evidence. It provides a structured
framework for drawing conclusions about the population using sample information, while accounting for
uncertainty and randomness.
================================================================

Prof. P. H. Kamble Page 22

Additional Exercises For Vectors, Matrices, and Least Squares
No ratings yet
Additional Exercises For Vectors, Matrices, and Least Squares
41 pages
Matrices and Linear Algebra in Control Applications
No ratings yet
Matrices and Linear Algebra in Control Applications
38 pages
Linear Algebra MATH 211 Textbook
No ratings yet
Linear Algebra MATH 211 Textbook
253 pages
Ubco Math221
100% (1)
Ubco Math221
241 pages
Binomial Logistic Regression Using SPSS
No ratings yet
Binomial Logistic Regression Using SPSS
11 pages
Data 01
No ratings yet
Data 01
5 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
100% (2)
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
257 pages
Linear Algebra
No ratings yet
Linear Algebra
4 pages
Applications of Linear Algebra in Data Science
No ratings yet
Applications of Linear Algebra in Data Science
6 pages
Kuttler LinearAlgebra AFirstCourse Yorku MATH2022 Summer2016
No ratings yet
Kuttler LinearAlgebra AFirstCourse Yorku MATH2022 Summer2016
256 pages
EML Couse Outcome
No ratings yet
EML Couse Outcome
2 pages
Linear Algebra - A Powerful Tool For Data Science
No ratings yet
Linear Algebra - A Powerful Tool For Data Science
6 pages
STA2005S Regression
No ratings yet
STA2005S Regression
92 pages
Kuttler LinearAlgebra AFirstCourse YorkU MATH2022 Winter2017
No ratings yet
Kuttler LinearAlgebra AFirstCourse YorkU MATH2022 Winter2017
258 pages
Vmls - 103exercises
No ratings yet
Vmls - 103exercises
50 pages
Lab. Manual PDF
No ratings yet
Lab. Manual PDF
310 pages
Concise Introduction to Linear Algebra 1st Edition Qingwen Hu - Download the ebook now to start reading without waiting
No ratings yet
Concise Introduction to Linear Algebra 1st Edition Qingwen Hu - Download the ebook now to start reading without waiting
76 pages
Immediate access to Concise Introduction to Linear Algebra 1st Edition Qingwen Hu ebook full chapters
100% (3)
Immediate access to Concise Introduction to Linear Algebra 1st Edition Qingwen Hu ebook full chapters
81 pages
Mathematics Probability Statistics Econometrics
No ratings yet
Mathematics Probability Statistics Econometrics
10 pages
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd instant download
100% (3)
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd instant download
47 pages
Linear Algebra_B.Tech._Course Plan_2024-25
No ratings yet
Linear Algebra_B.Tech._Course Plan_2024-25
4 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Linear_Algebra_LectureNote
No ratings yet
Linear_Algebra_LectureNote
288 pages
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd instant download
100% (2)
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd instant download
41 pages
Linear Algebra
No ratings yet
Linear Algebra
6 pages
ECON2125/8013 Maths Notes: John Stachurski March 4, 2015
100% (1)
ECON2125/8013 Maths Notes: John Stachurski March 4, 2015
162 pages
Elementary Linear Algebra, Applications Version 11th Edition Textbook
No ratings yet
Elementary Linear Algebra, Applications Version 11th Edition Textbook
12 pages
Instant download Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd pdf all chapter
100% (22)
Instant download Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd pdf all chapter
51 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
Course Outline 2
No ratings yet
Course Outline 2
4 pages
ASMITA SINHA
No ratings yet
ASMITA SINHA
15 pages
Vmls Python Companion
No ratings yet
Vmls Python Companion
192 pages
Full Download Computational Methods of Linear Algebra 3rd Edition Granville Sewell PDF
100% (24)
Full Download Computational Methods of Linear Algebra 3rd Edition Granville Sewell PDF
84 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
Linear Algebra
No ratings yet
Linear Algebra
2 pages
Linear Algebra
No ratings yet
Linear Algebra
58 pages
Maths Project Abdul
No ratings yet
Maths Project Abdul
15 pages
LinearAlgebra GDF Jan5 23
No ratings yet
LinearAlgebra GDF Jan5 23
305 pages
Linear Algebra Summary
No ratings yet
Linear Algebra Summary
80 pages
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd - Download All Chapters Immediately In PDF Format
100% (7)
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd - Download All Chapters Immediately In PDF Format
47 pages
Linear_algebra_and_some_of_it_application_to_machine_learning__1_ (2)
No ratings yet
Linear_algebra_and_some_of_it_application_to_machine_learning__1_ (2)
17 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Cis515 15 sl1 A
No ratings yet
Cis515 15 sl1 A
68 pages
Numerical Linear Algebra With Matlab
No ratings yet
Numerical Linear Algebra With Matlab
16 pages
Data Science Using r Programming_data Science Using r Unit 1-5
No ratings yet
Data Science Using r Programming_data Science Using r Unit 1-5
25 pages
Understanding Regression With Geometry - by Ravi Charan - Medium
No ratings yet
Understanding Regression With Geometry - by Ravi Charan - Medium
20 pages
2. Mathematical Methods Class 11 Physics Notes
No ratings yet
2. Mathematical Methods Class 11 Physics Notes
60 pages
Instant Download Linear Algebra to Differential Equations 1st Edition J. Vasundhara Devi PDF All Chapters
100% (1)
Instant Download Linear Algebra to Differential Equations 1st Edition J. Vasundhara Devi PDF All Chapters
67 pages
Solution Manual for Introduction to Applied Linear Algebra 1st by Boydinstant download
100% (5)
Solution Manual for Introduction to Applied Linear Algebra 1st by Boydinstant download
54 pages
Compre
No ratings yet
Compre
46 pages
Linear Algebra For Computer Science
No ratings yet
Linear Algebra For Computer Science
279 pages
Linear Algebra Pure Applied 1st Edition Edgar G. Goodaire - The ebook with rich content is ready for you to download
100% (1)
Linear Algebra Pure Applied 1st Edition Edgar G. Goodaire - The ebook with rich content is ready for you to download
47 pages
Notes
No ratings yet
Notes
269 pages
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd - Download PDF
100% (17)
Solution Manual for Introduction to Applied Linear Algebra 1st by Boyd - Download PDF
53 pages
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
No ratings yet
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
12 pages
(Ebook) Computational Methods of Linear Algebra by Granville Sewell ISBN 9789814603850, 9814603856 - Download the entire ebook instantly and explore every detail
100% (1)
(Ebook) Computational Methods of Linear Algebra by Granville Sewell ISBN 9789814603850, 9814603856 - Download the entire ebook instantly and explore every detail
53 pages
Applied Linear Algebra: Core Principles
From Everand
Applied Linear Algebra: Core Principles
Kartikeya Dutta
No ratings yet
Principles of Modern Linear Algebra
From Everand
Principles of Modern Linear Algebra
Pasquale De Marco
No ratings yet
Analytic Geometry and Linear Algebra for Physical Sciences
From Everand
Analytic Geometry and Linear Algebra for Physical Sciences
Kartikeya Dutta
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
House Price Prediction
No ratings yet
House Price Prediction
1 page
The Role of Price in The Behavior and Purchase Decisions
No ratings yet
The Role of Price in The Behavior and Purchase Decisions
9 pages
Mensch 2018
No ratings yet
Mensch 2018
13 pages
Ba
No ratings yet
Ba
22 pages
Rao 2020
No ratings yet
Rao 2020
31 pages
Research Framework
No ratings yet
Research Framework
15 pages
AMOS Note
No ratings yet
AMOS Note
75 pages
EDU 411 Topic 5 Data Analysis
No ratings yet
EDU 411 Topic 5 Data Analysis
9 pages
Methods of Psychology
No ratings yet
Methods of Psychology
5 pages
Machine Learning Applications For Building Structural Design and Performance Assessment
No ratings yet
Machine Learning Applications For Building Structural Design and Performance Assessment
41 pages
AI and Machine Learning For Risk Management
No ratings yet
AI and Machine Learning For Risk Management
18 pages
Question Bank (Economics) - Entry-Test-2021-22
No ratings yet
Question Bank (Economics) - Entry-Test-2021-22
60 pages
Propensity Models
No ratings yet
Propensity Models
4 pages
The Fruit & Vegetable Screener in The 2000 California Health Interview Survey (CHIS 2000)
No ratings yet
The Fruit & Vegetable Screener in The 2000 California Health Interview Survey (CHIS 2000)
17 pages
[P._McCullagh,_John_A._Nelder]_Generalized_Linear_(b-ok.xyz)
No ratings yet
[P._McCullagh,_John_A._Nelder]_Generalized_Linear_(b-ok.xyz)
274 pages
The Effectiveness of Dialectical Behavior Therapy Compared To Schema Therapy For Borderline Personality Disorder: A Randomized Clinical Trial
No ratings yet
The Effectiveness of Dialectical Behavior Therapy Compared To Schema Therapy For Borderline Personality Disorder: A Randomized Clinical Trial
15 pages
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
No ratings yet
A Guide To Select Appropriate Multivariable and Multivariate Statistical Methods 2021
4 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
Trend Lines Case Study ONLINE
No ratings yet
Trend Lines Case Study ONLINE
25 pages
SLM 2
No ratings yet
SLM 2
23 pages
Panel 101
No ratings yet
Panel 101
48 pages
Research Methods in Psychology
No ratings yet
Research Methods in Psychology
8 pages
Muhammad Shakir Azeem Education 2021 Qurtuba uni peshwar
No ratings yet
Muhammad Shakir Azeem Education 2021 Qurtuba uni peshwar
221 pages
Jasper 2
No ratings yet
Jasper 2
9 pages
afroz_21_assignment-2
No ratings yet
afroz_21_assignment-2
19 pages
Can We Predict Risk Taking Behaviour
No ratings yet
Can We Predict Risk Taking Behaviour
15 pages
Personality Factors As Predictors of The Level of Personality Functioning in Adolescence: Examining The Influence of Birth Order, Financial Status, and Sex
No ratings yet
Personality Factors As Predictors of The Level of Personality Functioning in Adolescence: Examining The Influence of Birth Order, Financial Status, and Sex
10 pages
Econometric Si Syl Lab Us
No ratings yet
Econometric Si Syl Lab Us
5 pages
Caribbean Studies: Antonique Headman Lori-Ann Brown
No ratings yet
Caribbean Studies: Antonique Headman Lori-Ann Brown
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Science for Civil Engineering Unit 2 Notes

Uploaded by

Data Science for Civil Engineering Unit 2 Notes

Uploaded by

Data Science for Civil Engineering

 Indeﬁnite Integral (Antiderivative): Represents the family of functions whose derivative is

© Prof. Prashant H. Kamble Page 1

© Prof. Prashant H. Kamble Page 2

 "y" is the dependent variable.

 "x" is the independent variable.

 "m" is the slope of the line.

 "b" is the y-intercept.

y = b0 + b1*x1 + b2*x2 + ... + bn*xn

 "y" is the dependent variable.

 "x1", "x2", ..., "xn" are the independent variables.

 "b0" is the intercept.

© Prof. Prashant H. Kamble Page 3

minimize: Σ(y_observed - y_predicted)^2

 Predictive modeling: Predicting a continuous output based on input features.

 Trend analysis: Analyzing trends and relationships between variables.

Assumptions and Considerations:

 Linearity: The relationship between the variables is assumed to be linear.

 Independence: The observations are assumed to be independent.

 Normality: The residuals are assumed to be normally distributed.

© Prof. Prashant H. Kamble Page 4

 "A" is the square matrix.

 "v" is the eigenvector.

 "λ" is the eigenvalue associated with the eigenvector "v."

 "λ" is the eigenvalue being considered.

 "I" is the identity matrix.

 "v" is the eigenvector.

Properties and Applications:

1. Eigen values as Scaling Factors:

© Prof. Prashant H. Kamble Page 5

o Diagonalization involves expressing a matrix as a product of three matrices: P (eigenvectors),

o A matrix is diagonalizable if it has a full set of linearly independent eigenvectors.

3. Matrix Powers and Exponentials:

 Matrix powers and exponentials can be computed using eigen decomposition.

4. Principal Component Analysis (PCA):

© Prof. Prashant H. Kamble Page 6

Consider the following symmetric matrix:

This matrix is symmetric because "A" is equal to its transpose "A^T."

© Prof. Prashant H. Kamble Page 7

Given a matrix "A" of size m x n, SVD decomposes it into three matrices:

 "U" (m x m) is the left singular matrix.

 "Σ" (m x n) is the diagonal singular value matrix.

 "V^T" (n x n) is the transpose of the right singular matrix.

Properties and Interpretation:

© Prof. Prashant H. Kamble Page 8

Consider the following matrix "A":

Principal Component Analysis

© Prof. Prashant H. Kamble Page 9

Applied Statistics Frequency Distribution

© Prof. Prashant H. Kamble Page 10

values by the number of intervals. Round up to a convenient value.

Suppose you have a dataset of exam scores:

Let's create a frequency distribution with 5 intervals:

1. Determine the range: Range = 95 - 58 = 37

2. Determine interval width: Interval width ≈ 37 / 5 = 7.4 (round to 8)

3. Intervals: [58 -65), [66-73), [74-81), [82-89), [90-97]

5. Relative Frequencies: 1/13, 2/13, 3/13, 4/13, 2/13

© Prof. Prashant H. Kamble Page 11

Permutations and Combinations

 "n" is the total number of items in the set.

 "r" is the number of items being arranged.

nCr = n! / (r! * (n - r)!)

 "n" is the total number of items in the set.

 "r" is the number of items being chosen.

 "n!" is the factorial of "n."

 "r!" is the factorial of "r."

© Prof. Prashant H. Kamble Page 13

P(A ∩ B) = P(A) * P(B)

3. Complementary Rule: The probability of the complement of an event A is given by:

© Prof. Prashant H. Kamble Page 14

© Prof. Prashant H. Kamble Page 15

(Discrete, Continuous, Poisson and Binomial) Discrete

 k is the number of events that occurred.

 e is the base of the natural logarithm (approximately 2.71828).

 k is the number of successes.

(Single and Multiple) and its applications

y = b0 + b1x1 + b2x2 + ... + bn*xn