Introduction to Linear Algebra
Introduction to Linear Algebra
Week 4
2
Why need linear algebra?
All data science, especially machine learning, depends on linear algebra and
statistics. Knowing math is crucial to understanding how neural networks
and other ML algorithms work.
3
Scalar, vector, matrix, tensor
4
Scalar, vector, matrix, tensor
0D 1D 2D ND
5
Name the value type
6
2D array
Most common form of data organization for machine learning is a 2D array.
Rows represent observations (records, items, data points).
Columns represent attributes (features, variables).
Natural to think of each sample as a vector of attributes, and whole array as
a matrix
7
Vectors
Vector is a n-tuple of values (usually real numbers).
A vector can be seen as a point in space or a directed line segment with a
magnitude (length) and direction.
Scalar values are defined only by magnitude.
"transpose"
8
Vector sum
Let a = (a1,…,an)T and b = (b1,…,bn)T
be two vectors.
Let vector z = a + b
z = (a1 + b1,…,an + bn)T
Examples:
a = (3,7)T; b = (2,-3)T; => z = (5,4)T
a = (3,2,1)T; b = (1,2,0)T; => z = (4,4,1)T
9
Vector multiplication
Let k be scalar, a = (a1,…,an)T
Let vector z = k × a, => z = (k × a1,…,k × an)T
Example:
a = (3,2)T, k = 2 => z = (6,4)T
a = (3,2)T, k = -0.5 => z = (-1.5,-1)T
a = (3,7)T, k = 3 => z = (9,21)T
10
Vector arithmetic
Let a = (a1,…,an)T and b = (b1,…,bn)T be two vectors.
Let vector z = a × b
z = ∑(a1 × b1,…,an × bn)T
Examples:
a = (3,7)T; b = (2,-3)T; => z = ∑(6,-21)T = -15
a = (3,2,1)T; b = (1,2,0)T; => z = ∑(3,4,0)T = 7
Projection: projection of y onto x is a perpendicular line from y onto x
(meet at point ) and the projection vector is the vector to that point.
Projectiona(b) = ((a × b) × a)/(a × a)
Example:
Projection(4,3,0)((25,0,5)) = (25×4 + 3×0 + 0×5)×(4,3,0)/(42+32+02) =
100×(4,3,0)/25 = = 4×(4,3,0) = (16,12,0)
11
Norm of a vector
Norm of a vector may be understood as distance:
d(x,y) = ||y – x||
There are more than one type of distance:
- Eucledian
- Manhattan
- Minkowski
etc
12
Matrix arithmetic
Definition: an m x n two-dimensional array of values (usually real numbers).
- m rows
- n columns
13
Matrix multiplication
Matrix-matrix multiplication is defined as the rows by columns
multiplication:
ci,j = ai,1 × b1,j + … + ai,n × bn,j = ∑ai,z×bz,j
A vector-matrix multiplication just a special case of a matrix-matrix
multiplication.
A B C
14
Matrix multiplication
Ci,1 Ci,2 Ci,3
0 1 2
2 1 2
A = B = 2 1 2 C = A×B = C1 4 9 12
1 3 3
1 3 3 C2 9 13 17
A B C
15
Matrix multiplication
0 1 2
2 1 2
A = B = 2 1 2
1 3 3
1 3 3
Step 1
16
Matrix multiplication
2 1 2
A =
1 3 3
0 1 2
B = 2 1 2
1 3 3
Step 2, 3
17
Matrix multiplication
2 1 2
A =
1 3 3
0 1 2
B = 2 1 2
1 3 3
Step 4, -1
18
Matrix multiplication
Matrix multiplication is associative:
A x (B x C) = (A x B) x C
Matrix multiplication is not commutative:
AxB≠BxA
Matrix transposition rule:
(A x B)T = BT x AT
19
Linear transformation
a
2 1 2 2×a + 1×b + 2×c
A = x = b C = Ax =
1 3 3 1×a + 3×b + 3×c
c
ℝ3 → ℝ2
T: ℝn → ℝm
T(x) = Ax
Function T is a linear transformation, in fact for each vector x,y and scalar c:
20
1D linear transformation
A one-dimensional linear transformation is a function T(x) = a(x) for some
scalar a.
To view the one-dimensional case in the same way we view higher
dimensional linear transformations, we can view a as a 1×1 matrix.
21
2D linear transformation
A two-dimensional linear transformation is a function T: ℝ2 → ℝ2 of the
form:
a b
T(x,y) = (ax+by, cx+dy) = x
c d
We can write this more succinctly as T(x) =y Ax,
where x = [x, y]T and A is the 2×2 matrix.
-1 0
T(x,y) x
= 0 -2
y
22
Determinant
During linear transformations, we perform stretching and squishing some of
the dimensions. It would be valuable to determine how our item's area has
changed. If we stretch x 3 times and y 2 times, we'll increase the area 6
times.
3 0
T(x,y) x
= 0 2
y
23
Determinant
If we don't change the x and y values, no matter how we tilt our item, it's
area won't change.
1 1
T(x,y) x
= 0 1
y
24
Determinant
The scaling factor, by which the linear transformation changes items area is
called determinant.
3 0 1 1
T(x,y) x T(x,y) x
= = 0 1
0 2
y =6
det(A) y =1
det(A)
a b c
a b e f d f d e
det( c d ) = ad - bc det( d e f ) = a×det(
h i ) -b×det(
g i ) g h
g h i
+c×det( )
25
Determinant
In general, in any dimension n, the determinant is a scalar value that can
be computed from the elements of a square matrix and encodes certain
properties of the linear transformation described by the matrix.
The determinant of a matrix A is denoted det(A).
26
Rank
The maximum number of matrix linearly independent columns (or rows ) of a
matrix is called the rank of a matrix. The rank of a matrix cannot exceed the
number of its rows or columns.
The rank is how many of the rows are "unique", a.k.a not made of other rows.
(Same for columns)
A = A = p(A) = 2
p(A) = 1
27
Matrix inversion
The inverse of a number a is such that a×a-1 = 1
For example the inverse of 10 is 0.1, as 10×0.1 = 1
The inverse of 5 is 0.2, and the inverse of 0.01 is 100. 1 0 0
The inverse of a matrix A is that matrix A-1 such that AA-1 = I
I = 0 1 0
Sometimes there is no inverse at all. In this case we say that the 0 0 1
matrix A is not invertible.
A square matrix that is not invertible is called singular.
A square matrix is singular if and only if its determinant is 0.
28