0% found this document useful (0 votes)
8 views105 pages

CO1 Notes

Uploaded by

khajamail07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views105 pages

CO1 Notes

Uploaded by

khajamail07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 105

Introduction to Computer vision: 2D and 3D transformation, Co-vectors, Stretch/Squash, Planar surface

flow, Bilinear Interpolant. 3D rotations, 3D to 2D projections :Orthography and para perspective, Pin hole
Camera Model, Camera Intrinsic ,Image sensing pipeline, samplin, and aliasing.

Introduction to Computer vision:


At its core, computer vision is the ability of computers to understand and analyze visual content in the same
way humans do. This includes tasks such as recognizing objects and faces, reading text and understanding the
context of an image or video.

Computer vision is closely related to artificial intelligence (AI) and often uses AI techniques such as machine
learning to analyze and understand visual data. Machine learning algorithms are used to “train” a computer to
recognize patterns and features in visual data, such as edges, shapes and colors.

Once trained, the computer can use this knowledge to identify and classify objects in new images and videos.
The accuracy of these classifications can be improved over time through further training and exposure to more
data.

In addition to machine learning, computer vision may also use techniques such as deep learning, which involves
training artificial neural networks on large amounts of data to recognize patterns and features in a way that is
similar to how the human brain works.

Real-World Applications :
1. Healthcare: Medical Imaging

Application: Medical imaging involves techniques and processes used to create images of the human body
for clinical purposes, medical procedures, or medical science. Computer vision plays a significant role in
interpreting these images.

Computer Vision Techniques Used:

1. Image Segmentation: Identifying and isolating different tissues, organs, or abnormalities in medical
images (e.g., segmenting tumors in MRI scans).
2. Pattern Recognition: Detecting patterns that are indicative of certain diseases (e.g., recognizing the
structure of cancer cells).
3. 3D Reconstruction: Creating 3D models from 2D image slices, such as those from CT or MRI
scans.
4. Anomaly Detection: Identifying abnormalities or unusual structures that may indicate disease or
injury.

Benefits:

 Accuracy: Enhances the precision of diagnoses.


 Speed: Accelerates the analysis of medical images.
 Consistency: Reduces human error and variability in interpretations.
2. Automotive: Autonomous Vehicles

Application: Autonomous vehicles (self-driving cars) use computer vision to navigate and understand their
environment.

Computer Vision Techniques Used:

1. Object Detection: Identifying objects such as pedestrians, vehicles, road signs, and obstacles.
2. Lane Detection: Recognizing lane markings to keep the vehicle within its lane.
3. SLAM (Simultaneous Localization and Mapping): Creating and updating maps of the
environment while tracking the vehicle's location.
4. Depth Estimation: Determining the distance to various objects using stereoscopic vision or other
depth-sensing technologies.

Benefits:

 Safety: Improves the ability to avoid accidents.


 Efficiency: Enhances route planning and traffic management.
 Convenience: Reduces the need for human drivers.

3. Security: Surveillance Systems

Application: Surveillance systems use computer vision to monitor and analyze video footage for security
purposes.

Computer Vision Techniques Used:

1. Motion Detection: Identifying and tracking moving objects in a video stream.


2. Facial Recognition: Recognizing and verifying individuals based on facial features.
3. Behavior Analysis: Analyzing patterns of behavior to detect suspicious activities.
4. Anomaly Detection: Identifying unusual activities or objects in a scene.

Benefits:

 Enhanced Security: Provides real-time monitoring and alerts for potential security breaches.
 Efficiency: Reduces the need for constant human monitoring.
 Scalability: Can monitor large areas and numerous video feeds simultaneously.

4. Entertainment: Augmented Reality (AR) and Video Games

Application: Augmented reality overlays digital information on the real world, while video games often use
computer vision for immersive experiences.

Computer Vision Techniques Used:

 Marker-based AR: Using specific patterns or images to trigger the display of digital content.
 Markerless AR: Recognizing and understanding the environment without predefined markers (e.g.,
detecting surfaces and objects).
 Gesture Recognition: Interpreting human gestures for interactive gaming experiences.
 Object Tracking: Keeping track of objects and their movements in real-time for AR applications.
Benefits:

 Interactivity: Enhances user engagement through interactive and immersive experiences.


 Innovation: Enables new forms of entertainment and creative expression.
 Realism: Increases the realism of games and AR applications.

5. Retail: Automated Checkout Systems and Inventory Management

Application: Retail applications use computer vision for tasks such as automated checkout and inventory
management.

Computer Vision Techniques Used:

1. Object Recognition: Identifying products and items at checkout or during inventory checks.
2. Barcode Scanning: Reading barcodes to automate checkout processes.
3. Shelf Monitoring: Monitoring shelves to ensure products are stocked and correctly placed.
4. Customer Behavior Analysis: Analyzing customer movements and interactions to optimize store
layouts and marketing strategies.

Benefits:

 Efficiency: Streamlines checkout processes and reduces wait times.


 Accuracy: Minimizes errors in inventory tracking and management.
 Customer Experience: Enhances the shopping experience through personalized services and
efficient operations.

These examples illustrate the broad applicability of computer vision in various industries, demonstrating its
potential to transform and improve different aspects of our daily lives.

The Challenges of Computer Vision


Computer vision is a complex field that involves many challenges and difficulties. Some of these challenges
include:

1. Data limitations
Computer vision requires large amounts of data to train and test algorithms. This can be problematic in
situations where data is limited or sensitive, and may not be suitable for processing in the cloud.
Additionally, scaling up data processing can be expensive and may be constrained by hardware and
other resources.
2. Learning rate
Another challenge in computer vision is the time and resources required to train algorithms. While error
rates have decreased over time, they still occur, and it takes time for the computer to be trained to
recognize and classify objects and patterns in images. This process typically involves providing sets of
labeled images and comparing them to the predicted output label or recognition measurements and then
modifying the algorithm to correct any errors.

3. Hardwarerequirements
Computer vision algorithms are computationally demanding, requiring fast processing and optimized
memory architecture for quicker memory access. Properly configured hardware systems and software
algorithms are also necessary to ensure that image-processing applications can run smoothly
and efficiently.
4. Inherent complexity in the visual world
In the real world, subjects may be seen from various orientations and in myriad lighting conditions, and
there are an infinite number of possible scenes in a true vision system. This inherent complexity makes it
difficult to build a general-purpose “seeing machine” that can handle all possible visual scenarios.

Overall, these challenges highlight the fact that computer vision is a difficult and complex field, and that there is
still much work to be done in order to build machines that can see and understand the world in the same way
humans do.

2D and 3D transformation :
Transformation means changing some graphics into something else by applying rules. We can have various
types of transformations such as translation, scaling up or down, rotation, shearing, etc.When a transformation
takes place on a 2D plane, it is called 2D transformation.

2D and 3D transformations are mathematical operations that modify the position, orientation, or size of objects
in a graphical scene. The main difference between the two is that 2D transformations use two coordinates (X
and Y) to manipulate elements in two dimensions, while 3D transformations use three coordinates (X, Y, and Z)
to add a third dimension, depth.

Different types of transformations

i. Translation

ii. Scaling

iii. Rotation

iv. Reflection

v. Shearing

2D Transformations
2D transformations involve manipulating objects within a two-dimensional plane. Common 2D transformations
include translation, scaling, rotation, and shearing.

i. Translation
A translation moves an object to a different position on the screen. You can translate a point in 2D by adding
translation coordinate (tx, ty) to the original coordinate (X, Y) to get the new coordinate (X1, Y1).
From the above figure, you can write that –
X1= X + tx
Y1 = Y + ty

Matrix Representation:

Adds the translation distances (tx, ty) to the coordinates of the object.

Example 1:
Translate the point P(3,4) by tx=5 units along the x-axis and ty=−2 units along the y-axis.

Solution:

The translation matrix T for a 2D translation is given by:

Plugging in the given translation values:


Point in Homogeneous Coordinates:

Convert the point P(3,4) to homogeneous coordinates:

Apply the Translation:

Multiply the translation matrix T by the point P:

Matrix Multiplication:

Perform the matrix multiplication:

Result:

The translated point P′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the translated point is P′(8,2).

Verification:

Let's check the result by manually applying the translation:

 Original point: P(3,4)


 Translation along x-axis: 3+5=8
 Translation along y-axis: 4−2=2
Thus, the translated point is P′(8,2), confirming our result.

Example 2

Given a triangle with vertices ABC having the coordinates as A(2,2), B(10,2) and C(5,5). Translate the triangle
with shifting vectors dx=5 and dy=6

ii. Scaling
To change the size of an object, scaling transformation is used. In the scaling process, you either expand or
compress the dimensions of the object. Scaling can be achieved by multiplying the original coordinates of the
object with the scaling factor to get the desired result.

Let us assume that the original coordinates are (X, Y), the scaling factors are (S X, SY), and the produced
coordinates are (X1, Y1). This can be mathematically represented as shown below –

X1 = X . SX

and

Y1 = Y . SY

P1 = P . S

Where S is the scaling matrix. The scaling process is shown in the following figure

Matrix Representation:
If we provide values less than 1 to the scaling factor S, then we can reduce the size of the object. If we provide
values greater than 1, then we can increase the size of the object.

Example 1:
Scale the point P(2,3) by sx=4 along the x-axis and sy=5 along the y-axis.

Solution:

Scaling Matrix: The scaling matrix S for a 2D scaling is given by:

Plugging in the given scaling factors:

Point in Homogeneous Coordinates: Convert the point P(2,3) to homogeneous coordinates:

Apply the Scaling: Multiply the scaling matrix S by the point P:


Matrix Multiplication: Perform the matrix multiplication step by step:

Result: The scaled point P′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the scaled point is P′(8,15).

Verification:

Let's check the result by manually applying the scaling:

 Original point: P(2,3)


 Scaling along x-axis: 2×4=8
 Scaling along y-axis: 3×5=15

Thus, the scaled point is P′(8,15), confirming our result.

Example 2: Scale the triangle with vertices at A(1,2), B(3,4), and C(5,1) by sx=2 along the x-axis and sy=3
along the y-axis.

iii.Rotation
In rotation, we rotate the object at particular angle θ (theta) from its origin. From the following figure, we can
see that the point P(X, Y) is located at angle φ from the horizontal X coordinate with distance r from the origin.
Let us suppose you want to rotate it at the angle θ. After rotating it to a new location, you will get a new point
P1(X1, Y1).
Using standard trigonometric the original coordinate of point P(X, Y) can be represented as –

Where R is the rotation matrix

Matrix Representation:
Example 1:
Rotate the point P(1,2) by 90 degrees counterclockwise around the origin.

Solution:

Rotation Matrix: The rotation matrix R for a 2D rotation by an angle θ is given by:

For a 90-degree counterclockwise rotation, θ=90∘ (or θ=π\2 radians):

Point in Homogeneous Coordinates: Convert the point P(1,2) to homogeneous coordinates:

Apply the Rotation: Multiply the rotation matrix R by the point P:

Matrix Multiplication: Perform the matrix multiplication step by step:


Result: The rotated point P′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the rotated point is P′(−2,1).

Verification:

Let's check the result by manually applying the rotation:

 Original point P(1,2)


 After rotating 90 degrees counterclockwise, the x-coordinate becomes the negative of the original y-
coordinate, and the y-coordinate becomes the original x-coordinate.

Thus, the rotated point is P′(−2,1), confirming our result.

Conclusion:

The point P(1,2) rotated by 90 degrees counterclockwise results in the point P′(−2,1).

iv.Reflection
Reflection is the mirror image of original object. In other words, we can say that it is a rotation operation with
180°. In reflection transformation, the size of the object does not change. The reflected object is always formed
on the other side of mirror.

Consider a point object O has to be reflected in a 2D plane.

Let Initial coordinates of the object O = (Xold, Yold)

New coordinates of the reflected object O after reflection = (Xnew, Ynew)

Reflection on X-Axis:

This reflection is achieved by using the following reflection equations-

Xnew = Xold

Ynew = -Yold
In Matrix form, the above reflection equations may be represented as-

For homogeneous coordinates, the above reflection matrix may be represented as a 3 x 3 matrix as-

Reflection on Y-Axis:

This reflection is achieved by using the following reflection equations-

Xnew = -Xold

Ynew = Yold

In Matrix form, the above reflection equations may be represented as-

For homogeneous coordinates, the above reflection matrix may be represented as a 3 x 3 matrix as-
Example 1:
Reflect the point P(3,4) across the x-axis and y-axis.

Solution:

1. Reflection Matrices: The reflection matrix Rx for a reflection across the x-axis is given by:

The reflection matrix Ry for a reflection across the y-axis is given by:

Point in Homogeneous Coordinates: Convert the point P(3,4) to homogeneous coordinates:

Apply the Reflection Across the x-axis: Multiply the reflection matrix Rx by the point P:

Matrix Multiplication for x-axis Reflection: Perform the matrix multiplication step by step:
Result of x-axis Reflection: The reflected point Px′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the reflected point across the x-axis is Px′(3,−4).

Apply the Reflection Across the y-axis: Multiply the reflection matrix Ry by the point P:

Matrix Multiplication for y-axis Reflection: Perform the matrix multiplication step by step:

Result of y-axis Reflection: The reflected point Py′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the reflected point across the y-axis is Py′(−3,4).

Verification:

Let's check the results by manually applying the reflection:

 Original point P(3,4)


 Reflecting across the x-axis: (3,−4)
 Reflecting across the y-axis: (−3,4)

Thus, the reflected points are Px′(3,−4) and Py′(−3,4), confirming our results.
Example 2:
Given a triangle with coordinate points A(3, 4), B(6, 4), C(5, 6). Apply the reflection on the X axis and obtain
the new coordinates of the object.

Example 3:
Given a triangle with coordinate points A(3, 4), B(6, 4), C(5, 6). Apply the reflection on the Y axis and obtain
the new coordinates of the object.
v.Shearing
A transformation that slants the shape of an object is called the shear transformation. There are two shear
transformations X-Shear and Y-Shear. One shifts X coordinates values and other shifts Y coordinate values.
However; in both the cases only one coordinate changes its coordinates and other preserves its values. Shearing
is also termed as Skewing. X-Shear The X-Shear preserves the Y coordinate and changes are made to X
coordinates, which causes the vertical lines to tilt right or left as shown in below figure

Consider a point object O has to be sheared in a 2D plane.

Let Initial coordinates of the object O = (Xold, Yold)

Shearing parameter towards X direction = Shx

Shearing parameter towards Y direction = Shy

New coordinates of the object O after shearing = (Xnew, Ynew)

Shearing in X Axis-

Shearing in X axis is achieved by using the following shearing equations-

Xnew = Xold + Shx *Yold

Ynew = Yold

In Matrix form, the above shearing equations may be represented as-

For homogeneous coordinates, the above shearing matrix may be represented as a 3 x 3 matrix as-

Shearing in Y Axis-
Shearing in Y axis is achieved by using the following shearing equations-

Xnew = Xold

Ynew = Yold + Shy *Xold

In Matrix form, the above shearing equations may be represented as-

For homogeneous coordinates, the above shearing matrix may be represented as a 3 x 3 matrix as

Example 1:
Shear the point P(2,3) with a shear factor of kx=1.5 along the x-axis and ky=0.5 along the y-axis.

Solution:

1. Shearing Matrices: The shearing matrix Sx for a shear along the x-axis is given by:

The shearing matrix Sy for a shear along the y-axis is given by:

Point in Homogeneous Coordinates: Convert the point P(2,3) to homogeneous coordinates:


Apply the Shear Along the x-axis: Multiply the shearing matrix Sx by the point P:

Matrix Multiplication for x-axis Shear: Perform the matrix multiplication step by step:

Result of x-axis Shear: The sheared point P'x in homogeneous coordinates is

Converting back to Cartesian coordinates, the sheared point along the x-axis is P`x(6.5,3).

Apply the Shear Along the y-axis: Multiply the shearing matrix Sy by the point P:

Matrix Multiplication for y-axis Shear: Perform the matrix multiplication step by step:

Result of y-axis Shear: The sheared point P'y in homogeneous coordinates is


Converting back to Cartesian coordinates, the sheared point along the y-axis is P'y(2, 4).

Verification:

Let's check the results by manually applying the shear:

 Original point P(2,3)


 Shearing along the x-axis: (2+1.5⋅3,3)=(2+4.5,3)=(6.5,3)
 Shearing along the y-axis: (2,3+0.5⋅2)=(2,3+1)=(2,4)

Thus, the sheared points are P'x(6.5, 3) and P'y(2, 4), confirming our results.

Conclusion:

The point P(2,3) sheared along the x-axis by a factor of kx=1.5 results in the point P'x(6.5, 3), and sheared along
the y-axis by a factor of ky=0.5 results in the point P'y(2, 4).

3D Transformations
3D transformations involve manipulating objects within a three-dimensional space.

i.Translation
Moving an object from one position to another in 3D space.

Matrix Representation:

Example:
Translate the point P(2,−1,3) by Δx=4, Δy=−3 and Δz=5
Translation Matrix: The translation matrix T for translating a point by Δx, Δy, and Δz is given by:

For this problem, Δx=4 ,Δy=−3, and Δz=5:

Point in Homogeneous Coordinates: Convert the point P(2,−1,3) to homogeneous coordinates:

Apply the Translation: Multiply the translation matrix T by the point P:

Matrix Multiplication: Perform the matrix multiplication step by step:

Result: The translated point P′ in homogeneous coordinates is


Converting back to Cartesian coordinates, the translated point is P′(6,−4,8).

Verification:

Let's check the result by manually applying the translation:

 Original point P(2, -1, 3)


 Translate by Δx=4,Δy=−3, and Δz=5:
o New x-coordinate: 2+4=6
o New y-coordinate: −1−3=−4
o New z-coordinate: 3+5=8

Thus, the translated point is P′(6,−4,8), confirming our result.

Conclusion:

The point P(2,−1,3) translated by Δx=4, Δy=−3, and Δz=5 results in the point P′(6,−4,8).

ii.Scaling
Changing the size of an object in 3D space.

Matrix Representation

Example:
Scale the point P(2,3,4) by a factor of Sx=2, Sy=3, and Sz=4.
Solution

Scaling Matrix: The scaling matrix S for scaling a point by Sx, Sy, and Sz is given by:

For this problem, Sx=2, Sy=3, and Sz=4:

Point in Homogeneous Coordinates: Convert the point P(2,3,4) to homogeneous coordinates:

Apply the Scaling: Multiply the scaling matrix S by the point P:

Matrix Multiplication: Perform the matrix multiplication step by step:

Result: The scaled point P′ in homogeneous coordinates is


Converting back to Cartesian coordinates, the scaled point is P′(4,9,16).

Verification:

Let's check the result by manually applying the scaling:

 Original point P(2,3,4)


 Scaling by Sx=2, Sy=3, and Sz=4:
o New x-coordinate: 2⋅2=4
o New y-coordinate: 3⋅3=9
o New z-coordinate: 4⋅4=16

Thus, the scaled point is P′(4,9,16), confirming our result.

iii.Rotation
Rotating an object around one of the principal axes (x, y, or z).

Matrix Representation:

Rotation around the X-axis:

Rotation around the Y-axis

Rotation around the Z-axis


Example:
Rotate the point P(2,3,4) by 45 degrees around the z-axis.

Solution:

1. Rotation Matrix: The rotation matrix Rz for rotating a point by an angle θ around the z-axis is given by:

For this problem, θ=45. Converting to radians, θ=π/4. Therefore:

Point in Homogeneous Coordinates: Convert the point P(2,3,4) to homogeneous coordinates:

Apply the Rotation: Multiply the rotation matrix Rz by the point P:


Matrix Multiplication: Perform the matrix multiplication step by step:

Result: The rotated point P′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the rotated point is

iv.Reflection
Reflection in 3D involves flipping a point across a plane, creating a mirror image of the point with respect to
that plane. The most common planes of reflection in 3D are the xy-plane, yz-plane, and xz-plane. Each of these
planes has its own reflection matrix.

Reflection Matrices

1. Reflection across the xy-plane: The reflection matrix Rxy for reflecting across the xy-plane (which flips
the z-coordinate) is:

Reflection across the yz-plane: The reflection matrix Ryz for reflecting across the yz-plane (which flips the x-
coordinate) is:
Reflection across the xz-plane: The reflection matrix Rxz for reflecting across the xz-plane (which flips the y-
coordinate) is:

Example:
Reflect the point P(3,4,5) across each of these planes

Solution

Point in Homogeneous Coordinates: Convert the point P(3,4,5) to homogeneous coordinates:

Reflection across the xy-plane: Multiply the reflection matrix Rxy by the point P:

Perform the matrix multiplication:

The reflected point across the xy-plane is P`xy(3,4,−5).


Reflection across the yz-plane: Multiply the reflection matrix Ryz by the point P:

Perform the matrix multiplication:

The reflected point across the yz-plane is P'yz(-3, 4, 5).

Reflection across the xz-plane: Multiply the reflection matrix Rxz by the point P:

Perform the matrix multiplication:

The reflected point across the xz-plane is P'xz(3, -4, 5)

Conclusion:

The point P(3,4,5) is reflected across the three planes as follows:

 Across the xy-plane: P'xy(3, 4, -5)


 Across the yz-plane: P'yz(-3, 4, 5)
 Across the xz-plane: P'xz(3, -4, 5)
v.Shearing
Skewing the shape of an object along one or more axes in 3D space.

Matrix Representation

Example:
Apply a shear transformation to the point P(1,2,3) with shear factors shxy=1.5, shxz=−0.5, shyx=0.5, shyz=1.0,
shzx=−1.0, and shzy=0.75.

Solution

Shearing Matrix: The general shearing matrix S in 3D with shear factors shxy, shxz, shyx, shyz, shzx, and shzyis
given by:

For this problem, the shear factors are shxy=1.5, shxz=−0.5, shyx=0.5, shyz=1.0, shzx=−1.0, and shzy=0.75
Point in Homogeneous Coordinates: Convert the point P(1,2,3) to homogeneous coordinates:

Apply the Shearing: Multiply the shearing matrix S by the point P:

Matrix Multiplication: Perform the matrix multiplication step by step:

Result: The sheared point P′ in homogeneous coordinates is

Converting back to Cartesian coordinates, the sheared point is P′(2.5,5.5,3.5).


Co-vectors
Definition:

A covector is a linear functional that maps a vector to a scalar. It belongs to the dual space of a vector space.

 A vector is a mathematical object that has both a magnitude ( or length) and a direction.
 In 2D Space: A vector v in 2D space can be represented as:

where vx and vy are the components of the vector along the x-axis and y-axis, respectively.

 A scalar is a quantity that is fully described by a single numerical value and has no direction. Scalars are
used to represent magnitudes such as temperature, mass, speed, energy, and time. In contrast to vectors,
which have both magnitude and direction, scalars only have magnitude.

Examples of Scalars

 Temperature: 25°C
 Mass: 10 kg
 Speed: 60 km/h
 Energy: 100 Joules
 Time: 5 seconds

 The product of a vector v and a scalar k is a vector w whose components are scaled by k:

 A vector spaceV is a collection of vectors that can be added together and multiplied (scaled) by scalars
while satisfying certain axioms (closure, associativity, identity, and inverses for addition and scalar
multiplication). Examples of vector spaces include Rn, where n is a positive integer, and each vector in
Rn has n components.
 The dual space of a vector space V is a new vector space, usually denoted by V∗, consisting of all linear
functionals defined on V.

 A linear functional is a function that takes a vector from the vector space V and maps it to a scalar,
while satisfying linearity properties. This means that for any vectors u,v∈V, and any scalars a,b∈R, a
linear functional ϕ∈V∗satisfies:

ϕ(au+bv)=aϕ(u)+bϕ(v)
 A covector is another name for a linear functional. Thus, a covector maps a vector to a scalar in a linear
fashion.

Relationship between Vectors and Covectors

 Vectors are elements of the vector space V.


 Covectors (or linear functionals) are elements of the dual space V∗

Notation:

 Vectors are typically denoted by bold lowercase letters, e.g., v.


 Covectors are typically denoted by lowercase Greek letters, e.g.,φ.

Example

Vector Space R2

Consider the vector space R2 (It is the set of all ordered pairs of real numbers. It is a two-dimensional vector
space over the field of real numbers R). A vector in R2 can be written as

Dual Space (R2)*

The dual space of R2denoted(R2)*, consists of all linear functionals that map vectors in R2 to scalars. A covector
(linear functional) in (R2)∗ can be written as a row vector

ϕ=[ϕ1ϕ2]

Mapping

The covectorϕ maps the vector v to a scalar via the dot product:

Properties of Covectors:

1. Linearity: Covectors are linear functions, satisfying linearity properties.


2. Duality: The relationship between vectors and covectors is fundamental in many areas of mathematics
and physics, particularly in differential geometry and tensor calculus.
3. Coordinate-Free: Covectors provide a coordinate-free way to deal with linear mappings, making them
useful in theoretical contexts.

In computer vision, covectors (or dual vectors) are used in several ways such as

1. Image Gradients and Edge Detection


2. Homography and Projective Transformations
3. Camera Models and Calibration
4. Optimization and Cost Functions
5. Differential Geometry and Shape Analysis

Example:
Given a vector

in R2 and a covectorϕ=[4−1] in the dual space (R2)∗ find the scalar value obtained by
applying the covector ϕ to the vector v.

Solution

Step 1: Understand the Mapping

The covectorϕ is a linear functional that maps the vector v to a scalar via the dot product. The dot product of a
covectorϕ=[ϕ1 ϕ2] and a vector

is calculated as: ϕ(v)=ϕ1v1+ϕ2v2

Step 2: Identify the Components

For the given vector v and covector ϕ:

 ϕ1=4
 ϕ2=−1
 v1=2
 v2=3

Step 3: Perform the Dot Product Calculation

Substitute the components into the dot product formula:

Step 4: Calculating we have


The scalar value obtained by applying the covectorϕ to the vector v is 5.

Stretch/Squash
Stretch and squash transformations are types of linear transformations applied to images, shapes, or objects
in computer vision to change their dimensions. These transformations are often used in animation, image
processing, and geometric manipulations.

Stretch Transformation
A stretch transformation scales an object more in one direction than in another. For instance, stretching an
object horizontally while keeping the vertical dimension unchanged.

2D Stretch Transformation
In 2D, a stretch transformation can be represented by a scaling matrix that scales differently along the x-axis
and y-axis.

where sx and sy are the scaling factors along the x and y directions, respectively.

Example:
Stretch a point P(2,3) by 2 along the x-axis and 1 along the y-axis.

The point P(2,3) is stretched to P′(4,3).


Squash Transformation:
A squash transformation is the opposite of a stretch transformation, where an object is compressed more in
one direction than in another. For instance, squashing an object horizontally while keeping the vertical
dimension unchanged.

2D Squash Transformation:
A squash transformation can also be represented by a scaling matrix, but with the scaling factors sx and sy being
less than 1 in the respective directions.

Example:
Squash a point P(4,5) by 0.5 along the x-axis and 1 along the y-axis

The point P(4,5) is squashed to P′(2,5).

3D Stretch and Squash Transformations:


In 3D, stretch and squash transformations can be applied using a 3D scaling matrix.

3D Scaling Matrix

where sx, sy, and sz are the scaling factors along the x, y, and z directions, respectively.
Example of 3D Stretch:
Stretch a point P(1,2,3) by 2 along the x-axis, 1 along the y-axis, and 0.5 along the z-axis.

The point P(1,2,3) is stretched to P′(2,2,1.5).

Example of 3D Squash:
Squash a point P(2,4,6) by 0.5 along the x-axis, 0.25 along the y-axis, and 1 along the z-axis.

The point P(2,4,6) is squashed to P′(1,1,6).


Applications in Computer Vision:
1. Image Resizing: Stretch and squash transformations are commonly used in resizing images. Stretching
an image can be used to fit it into a larger frame, while squashing can reduce its size.
2. Aspect Ratio Adjustment: Adjusting the aspect ratio of an image or video involves stretching or
squashing the content to fit the desired dimensions.
3. Animation: In character animation, stretch and squash are principles used to create more dynamic and
expressive motions. For example, a bouncing ball might stretch as it speeds up and squash when it hits
the ground.
4. Geometric Transformations: Applying non-uniform scaling to objects in a 3D scene to achieve desired
proportions.
5. Normalization: Stretch and squash transformations are used to normalize data, such as adjusting the
scale of feature vectors in machine learning applications.

Differences between Stretch and Squash transformations


Direction of Scaling:

• Stretch: Increases the size of the object along one axis.


• Squash: Decreases the size of the object along one axis.

Impact on Axes:

• Stretch: The scaling factor is greater than 1 along one axis (elongates the object).
• Squash: The scaling factor is less than 1 along one axis (compresses the object).

Resulting Shape:

• Stretch: The object becomes elongated along the specified axis.


• Squash: The object becomes compressed along the specified axis.

Planar surface flow


Planar surface flow refers to the apparent motion of points on a plane observed from a moving camera. It's a
specific case of optical flow, which is the pattern of apparent motion of objects in a visual scene caused by the
relative motion between the observer (camera) and the scene.

When a camera moves relative to a planar surface, the points on that surface appear to move in a certain way on
the image plane. This movement can be described mathematically, and understanding it is crucial for various
computer vision applications like motion estimation, 3D reconstruction, and image stabilization.
Definition: Planar surface flow describes the apparent motion of points on a flat, two-dimensional surface in
the 3D world as observed in the 2D image plane. This motion can be caused by either the movement of the
camera, the surface, or both.

Mathematical Formulation

Consider a 3D point P=(X,Y,Z) on a planar surface. When projected onto the 2D image plane, it has coordinates
p=(x,y).

Image Formation Model

The relationship between the 3D coordinates (X,Y,Z) and the 2D image coordinates (x,y) can be expressed as:

where f is the focal length of the camera.

Planar Surface Assumption

If the planar surface can be described by the equation Z=aX+bY+c, the coordinates (X,Y,Z) of any point on the
surface can be related to its image coordinates (x,y).

Camera Motion

Assume the camera undergoes a rigid motion characterized by a rotation matrix R and a translation vector t. The
new position of the point P after the motion is:

P′=RP+t

The corresponding image coordinates (x′,y′) can be derived using the same image formation model.

Flow Field Computation

The displacement of the point in the image plane from (x,y) to (x′,y′) gives the flow vector (u,v):

u=x′−x

v=y′−y

Example:
Consider a planar surface Z=2 and a camera moving with a translation vector t=(0,0,1) and no rotation.
Compute the image flow for a point (X,Y,Z)=(1,1,2) with a focal length f=1000.
Solution

Step 1: Initial Image Coordinates

Using the image formation model, we project the 3D point (X,Y,Z)=(1,1,2) onto the image plane:

So, the initial image coordinates are (x,y)=(500,500).

Step 2: Camera Translation

The camera undergoes a translation t=(0,0,1). The new Z coordinate after translation becomes:

The X and Y coordinates remain unchanged because the translation is only along the Z-axis:

X′=X=1

Y′=Y=1

Step 3: New Image Coordinates

Project the new 3D coordinates (X′,Y′,Z′)=(1,1,3) onto the image plane:

So, the new image coordinates are (x′,y′)≈(333.33,333.33).


Step 4: Compute the Flow Vectors

The flow vectors (u,v) represent the displacement of the point in the image plane:

u=x′−x=333.33−500=−166.67

v=y′−y=333.33−500=−166.67

So, the flow vector for the point (X,Y,Z)=(1,1,2) with a focal length f=1000 is (−166.67,−166.67).

Note:

By considering the focal length f=1000 and the translation of the camera, we computed the planar surface flow
for a point initially at (X,Y,Z)=(1,1,2). The flow vector (−166.67,−166.67) represents the apparent motion of the
point on the image plane due to the camera's movement.

Example 2:
Problem Statement

Consider a planar surface Z=2 and a camera moving with a translation vector t=(0,0,1) and a rotation around the
y-axis by an angle θ=30∘. Compute the image flow for a point (X,Y,Z)=(1,1,2) with a focal length f=1000.

Solution

Step 1: Initial Image Coordinates

Using the image formation model, we project the 3D point (X,Y,Z)=(1,1,2) onto the image plane:

So, the initial image coordinates are (x,y)=(500,500)(x, y) = (500, 500)(x,y)=(500,500).

Step 2: Camera Transformation

Translation

The camera undergoes a translation t=(0,0,1). The new Z coordinate after translation becomes:
Z′=Z+tz=2+1=3

The X and Y coordinates remain unchanged because the translation is only along the ZZZ-axis:

X′=X=1

Y′=Y=1

Rotation

The camera also undergoes a rotation around the y-axis by an angle θ=30∘. The rotation matrix R for a rotation
around the y-axis is:

Substituting θ=30∘

Applying the rotation matrix to the point (X′,Y′,Z′)=(1,1,3):

Step 3: New Image Coordinates

Project the new 3D coordinates (X′′,Y′′,Z′′)=(2.366,1,2.098) onto the image plane:


So, the new image coordinates are (x′,y′)≈(1128.64,476.17).

Step 4: Compute the Flow Vectors

The flow vectors (u,v) represent the displacement of the point in the image plane:

u=x′−x=1128.64−500=628.64

v=y′−y=476.17−500=−23.83

So, the flow vector for the point (X,Y,Z)=(1,1,2) with a focal length f=1000 and a rotation around the y-axis by
30∘ is (628.64,−23.83).

Note:

By considering both the translation and rotation of the camera, we computed the planar surface flow for a point
initially at (X,Y,Z)=(1,1,2). The flow vector (628.64,−23.83) represents the apparent motion of the point on the
image plane due to the camera's movement and rotation. This type of calculation is essential in computer vision
tasks like motion estimation, 3D reconstruction, and image stabilization.

Real-World Applications of Planar Surface Flow


1. Motion Estimation and Tracking

Application: Self-Driving Cars

Description: Self-driving cars use planar surface flow to estimate the motion of the vehicle relative to the
road surface. By analyzing the flow of points on the road, the car can determine its speed, direction, and
detect obstacles.

Example: A self-driving car analyzes the flow of lane markings and the road surface to maintain its lane,
adjust speed, and make turns.
2. 3D Reconstruction

Application: Virtual Reality and Augmented Reality

Description: Planar surface flow is used to reconstruct 3D scenes from 2D images. This is crucial for
creating realistic VR and AR environments where the geometry of the physical world needs to be accurately
represented.

Example: AR applications like furniture placement apps use 3D reconstruction to accurately place virtual
objects on real-world surfaces, allowing users to see how furniture would look in their home.

3. Image Stabilization

Application: Video Recording and Photography

Description: Image stabilization algorithms use planar surface flow to detect and correct camera shake.
This results in smoother videos and sharper images, especially in handheld or moving shots.

Example: Smartphones and cameras use image stabilization to reduce the blurriness caused by hand
movements, providing clear and stable footage.

4. Robotics Navigation

Application: Autonomous Robots

Description: Autonomous robots use planar surface flow to navigate environments. By analyzing the
motion of the floor or other planar surfaces, robots can understand their movement and avoid obstacles.

Example: A robotic vacuum cleaner uses planar surface flow to map out a room, identify obstacles like
furniture, and efficiently clean the floor.

5. Aerial Mapping and Surveying

Application: Drones

Description: Drones use planar surface flow to create accurate maps and survey land. By analyzing the
flow of ground points, drones can generate detailed 3D models of terrain and structures.

Example: A drone surveying a construction site uses planar surface flow to create a 3D model of the area,
allowing for precise measurement and monitoring of progress.

6. Medical Imaging

Application: Endoscopy

Description: In medical imaging, planar surface flow can assist in navigating and mapping internal body
structures. This is especially useful in endoscopic procedures where accurate navigation is crucial.
Example: During an endoscopic examination, planar surface flow can help create a 3D map of the internal
surfaces being examined, aiding in accurate diagnosis and treatment.

7. Gesture Recognition

Application: Human-Computer Interaction

Description: Planar surface flow is used in gesture recognition systems to detect and interpret human
gestures. This allows for more intuitive and natural interaction with devices and applications.

Example: A smart TV uses gesture recognition to allow users to control the interface by waving their
hands, using planar surface flow to track the movement of the hands.

8. Sports Analytics

Application: Motion Analysis in Sports

Description: Planar surface flow is used to analyze the motion of athletes during sports activities. This
helps in improving performance, understanding techniques, and preventing injuries.

Example: A sports analytics system uses planar surface flow to track the motion of a basketball player,
analyzing their movements to provide feedback on shooting techniques and defensive maneuvers.

Bilinear Interpolant
Interpolation:
Interpolation is a mathematical technique used to estimate unknown values that fall within the range of known
data points.

Bilinear interpolation is a method of estimating the value of a function at a given point based on the values at
nearby points on a rectangular grid. It is widely used in image processing, computer graphics, and geographic
information systems (GIS) to perform tasks such as image resizing, warping, and texture mapping. The method
extends linear interpolation by performing linear interpolation in two directions.

Formula

Given a rectangular grid with points (x0,y0), (x1,y0), (x0,y1), and (x1,y1) with corresponding function values
f(x0,y0), f(x1,y0), f(x0,y1), and f(x1,y1), the bilinear interpolant f(x,y) at a point (x,y) within the grid can be
calculated as follows:
Interpolate in the x-direction:

Interpolate in the y-direction:

Note:

 f(x,y): Represents the value of the function f at the coordinates (x,y).

 In the context of images, f(x,y) might represent the intensity or color value of the pixel located at
position (x,y).

Example:
Given the values of a function f(x,y) at four grid points:

 f(x0,y0)=f(0,0)=1
 f(x1,y0)=f(1,0)=2
 f(x0,y1)=f(0,1)=3
 f(x1,y1)=f(1,1)=4

Calculate the value of the function at the point (0.5,0.5) using bilinear interpolation.

Solution

1. Interpolate in the x-direction at y=0 and y=1:


2. Interpolate in the y-direction at x=0.5:

The estimated value at point (0.5,0.5) using bilinear interpolation is 2.5

Note:

 The y values of 0 and 1 are fixed because they correspond to the known y-coordinates of the grid points.
 We interpolate along the x-direction at these fixed y-values to get intermediate values.

 Then, we interpolate along the y-direction using these intermediate values to get the final interpolated
value at the desired point.

Example
Given a 2x2 grid of known pixel values, estimate the value at a point (x,y) inside this grid.

Here's the grid:

 f(1,1)=10
 f(2,1)=20
 f(1,2)=30
 f(2,2)=40

We want to find the value at point (1.5,1.5).

Solution:

Consider the grid

Here:

 f(1,1)=10: The value of the function at coordinates (1,1) is 10.


 f(2,1)=20: The value of the function at coordinates (2,1) is 20.
 f(1,2)=30: The value of the function at coordinates (1,2) is 30.
 f(2,2)=40: The value of the function at coordinates (2,2) is 40.
Bilinear Interpolation Process
Bilinear interpolation is used to estimate the value of the function at a point (x,y) that lies within the grid but not
necessarily at a grid point. The interpolation process involves the following steps:

1. Identify the surrounding grid points: For a given point (x,y), identify the four nearest grid points. In
our example, to estimate the value at (1.5,1.5), the surrounding grid points are:
o (1,1)
o (2,1)
o (1,2)
o (2,2)
2. Interpolate along the x-direction: Compute intermediate values at the desired x-coordinate for the
known y-coordinates.

 For y=1:

Substitute the known values:

 For y=2

Substitute the known values:

Interpolate along the y-direction: Use the intermediate values obtained to compute the final value at the
desired (x,y) point.

 Interpolate at x=1.5
Substitute the values obtained:

The estimated value at point (1.5,1.5) using bilinear interpolation is 25.

Example
Given a 2x2 grid of pixel values, estimate the RGB color value at point (1.5,1.5).

Here's the grid:

 f(1,1)=(10,20,30) (RGB values)


 f(2,1)=(40,50,60)
 f(1,2)=(70,80,90)
 f(2,2)=(100,110,120)

We want to find the color at point (1.5,1.5).

Solution:

Bilinear Interpolation Process

Bilinear interpolation involves two linear interpolations in one direction and one linear interpolation in the
perpendicular direction. We will perform this separately for the red, green, and blue components.

Step 1: Interpolate along the x-direction

For each y value, we find the intermediate RGB values at x=1.5.

For y=1:

Red component interpolation:

Substituting values:
Green component interpolation:

Substituting values:

Blue component interpolation:

Substituting values:

So, the interpolated RGB value at (1.5,1) is:f(1.5,1)=(25,35,45)

For y=2:

Red component interpolation:

Substituting values:

Green component interpolation:


Substituting values:

Blue component interpolation:

Substituting values:

So, the interpolated RGB value at (1.5,2) is:

f(1.5,2)=(85,95,105)

Step 2: Interpolate along the y-direction

Now, we use the intermediate values obtained from the xxx-direction interpolation to find the final value at
(1.5,1.5).

Red component interpolation:

Substituting values:

Green component interpolation:


Substituting values:

Blue component interpolation:

Substituting values:

So, the interpolated RGB value at (1.5,1.5) is:

f(1.5,1.5)=(55,65,75)

The estimated RGB color value at point (1.5,1.5) using bilinear interpolation is (55,65,75).

Applications of Bilinear Interpolation


In computer vision, bilinear interpolation is a crucial technique used for a variety of tasks that require estimating
pixel values at non-integer coordinates. Here are the main purposes and applications of bilinear interpolation in
computer vision:

1. Image Resizing

 Upscaling: When enlarging an image, bilinear interpolation is used to estimate the values of new pixels
that fall between the original pixels. This helps to create a smoother and more visually appealing
enlarged image.
 Downscaling: When reducing the size of an image, bilinear interpolation helps in determining the pixel
values of the smaller image, ensuring that the reduced image maintains the overall appearance and
details of the original image.
2. Geometric Transformations

 Rotation: During the rotation of an image, pixel values must be calculated for the new positions of the
pixels. Bilinear interpolation provides a smooth transition by averaging the values of the nearest pixels.
 Translation: When shifting an image, bilinear interpolation helps in determining the new pixel values to
avoid a jagged or blocky appearance.
 Shearing: For shearing transformations, bilinear interpolation calculates intermediate pixel values to
produce a more natural-looking result.

3. Image Warping and Morphing

 Image Warping: Bilinear interpolation is used to calculate the pixel values when an image is distorted
or transformed to align with another image or a geometric model. This is essential in applications like
image registration and stitching.
 Morphing: In image morphing, where one image gradually transforms into another, bilinear
interpolation ensures smooth transitions between corresponding points in the images.

4. Texture Mapping in 3D Graphics

 When applying a 2D texture to a 3D surface, bilinear interpolation is used to compute the texture color
for each pixel on the surface. This avoids the appearance of pixelation and creates a more realistic
texture effect.

5. Image Blending and Compositing

 Alpha Blending: When combining two images or adding overlays, bilinear interpolation helps in
blending pixel values smoothly, avoiding harsh edges and creating a seamless composite image.
 Image Compositing: For tasks such as layering images or combining different elements into a single
image, bilinear interpolation ensures that the transitions between different layers are smooth.

6. Object Detection and Recognition

 Feature Extraction: When extracting features from images (e.g., edges, corners), the algorithms might
work on resized or transformed versions of the image. Bilinear interpolation ensures that the resized or
transformed images retain the necessary details for accurate feature detection.
 Template Matching: In template matching, bilinear interpolation helps in resizing or rotating the
template to match different scales and orientations of the target object in the image.

7. Optical Flow Estimation

 Flow Field Calculation: In optical flow algorithms, which estimate the motion between consecutive
frames in a video, bilinear interpolation is used to compute intermediate pixel values to enhance the
accuracy of motion vectors.

8. Super-Resolution

 Image Super-Resolution: When generating high-resolution images from low-resolution inputs, bilinear
interpolation is used as part of the process to estimate the high-resolution pixel values, often as a
preliminary step before applying more advanced algorithms.
3D to 2D projections: Orthography Projection and Parallel projection(para
perspective):
3D to 2D projections are essential in computer graphics, computer vision, and various fields requiring
visualization of three-dimensional objects on two-dimensional surfaces. Two common types of projections are
orthographic and parallel (paraperspective) projections.

Orthographic Projection

Orthographic projection is a type of parallel projection where the projection lines are perpendicular to the
projection plane. This means that objects are projected onto the plane without any perspective distortion,
preserving their relative dimensions and shapes.

Characteristics:

 Projection lines are parallel and orthogonal (perpendicular) to the projection plane.
 No perspective distortion: objects maintain their size and shape regardless of their distance from the
projection plane.
 Typically used for technical drawings, engineering designs, and CAD applications.

Mathematical Representation:

For an object point P(x,y,z) in 3D space, its orthographic projection P′(x′,y′) on the 2D plane is given by:

x′=x

y′=y

This can be represented using a projection matrix P as follows:

So the projection of point P is:


Example:
Given a 3D point P(x,y,z) with coordinates P(3,4,5), find its orthographic projection onto the XY, XZ, and YZ
planes.

Solution

Orthographic projection onto the XY, XZ, and YZ planes involves projecting the 3D coordinates onto each of
these planes by ignoring one of the coordinates in each case.

1. Projection onto the XY Plane

When projecting onto the XY plane, the z-coordinate is ignored.

Given point P(3,4,5):

(x′,y′)=(x,y)=(3,4)

So, the orthographic projection onto the XY plane is PXY(3,4).

2. Projection onto the XZ Plane

When projecting onto the XZ plane, the y-coordinate is ignored.

Given point P(3,4,5):

(x′,z′)=(x,z)=(3,5)

So, the orthographic projection onto the XZ plane is PXZ(3,5).

3. Projection onto the YZ Plane

When projecting onto the YZ plane, the x-coordinate is ignored.

Given point P(3,4,5):

(y′,z′)=(y,z)=(4,5)

So, the orthographic projection onto the YZ plane is PYZ(4,5).


Example 2
Problem Statement

Given a 3D object with the following vertices:

 A(1,2,3)
 B(4,5,6)
 C(7,8,9)
 D(2,3,1)

Find the orthographic projections of these vertices onto the XY, XZ, and YZ planes.

Solution

To find the orthographic projections onto the XY, XZ, and YZ planes, we will project each vertex onto the
respective planes by ignoring one of the coordinates.

Projection onto the XY Plane

For each vertex, we ignore the z-coordinate:

 A(1,2,3) → AXY(1,2)
 B(4,5,6) → BXY(4,5)
 C(7,8,9) → CXY(7,8)
 D(2,3,1) → DXY(2,3)

Projection onto the XZ Plane

For each vertex, we ignore the y-coordinate:

 A(1,2,3) → AXZ(1,3)
 B(4,5,6) → BXZ(4,6)
 C(7,8,9) → CXZ(7,9)
 D(2,3,1) → DXZ(2,1)

Projection onto the YZ Plane

For each vertex, we ignore the x-coordinate:

 A(1,2,3) → AYZ(2,3)
 B(4,5,6) → BYZ(5,6)
 C(7,8,9) → CYZ(8,9)
 D(2,3,1) → DYZ(3,1)
Example 3
Problem Statement

Given a cube with vertices at:

 A(1,1,1)
 B(1,1,−1)
 C(1,−1,1)
 D(1,−1,−1)
 E(−1,1,1)
 F(−1,1,−1)
 G(−1,−1,1)
 H(−1,−1,−1)

Find the orthographic projections of these vertices onto the XY, XZ, and YZ planes.

Solution

Projection onto the XY Plane

For each vertex, we ignore the z-coordinate:

 A(1,1,1) → AXY(1,1)
 B(1,1,−1) → BXY(1,1)
 C(1,−1,1) → CXY(1,−1)
 D(1,−1,−1) → DXY(1,−1)
 E(−1,1,1) → EXY(−1,1)
 F(−1,1,−1)→ FXY(−1,1)
 G(−1,−1,1) → GXY(−1,−1)
 H(−1,−1,−1) → HXY(−1,−1)

Projection onto the XZ Plane

For each vertex, we ignore the y-coordinate:

 A(1,1,1) → AXZ(1,1)
 B(1,1,−1) → BXZ(1,−1)
 C(1,−1,1) → CXZ(1,1)
 D(1,−1,−1) → DXZ(1,−1)
 E(−1,1,1) → EXZ(−1,1)
 F(−1,1,−1) → FXZ(−1,−1)
 G(−1,−1,1) → GXZ(−1,1)
 H(−1,−1,−1) → HXZ(−1,−1)

Projection onto the YZ Plane

For each vertex, we ignore the x-coordinate:

 A(1,1,1) → AYZ(1,1)
 B(1,1,−1) → BYZ(1,−1)
 C(1,−1,1) → CYZ(−1,1)
 D(1,−1,−1) → DYZ(−1,−1)
 E(−1,1,1) → EYZ(1,1)
 F(−1,1,−1) → FYZ(1,−1)
 G(−1,−1,1) → GYZ(−1,1)
 H(−1,−1,−1) → HYZ(−1,−1)

Parallel Projection (Para perspective):


Parallel projection, or paraperspective projection, is a type of projection where the projection lines are parallel,
but not necessarily orthogonal to the projection plane. Unlike orthographic projection, paraperspective
projection can introduce some perspective effects, though it's less pronounced compared to perspective
projection.

Characteristics:

 Projection lines are parallel but not perpendicular to the projection plane.
 Some perspective effects are present, depending on the distance of objects from the projection plane.
 Often used in technical illustrations where some depth cues are needed without full perspective
distortion.

Mathematical Representation:

For a point P(x,y,z) in 3D space, its paraperspective projection P′(x′,y′) on the 2D plane can be represented by
adjusting the orthographic projection equations to include a scaling factor based on depth z:

where d is a distance parameter that adjusts the degree of perspective distortion.

Projection Matrix:

In homogeneous coordinates, this can be represented using a projection matrix P:


Example
Given a 3D point P(x,y,z) with coordinates P(4,3,2), find its paraperspective projection onto the XY plane using
a distance parameter d=5.

Solution

In a paraperspective projection, the projection lines are parallel but not necessarily orthogonal to the projection
plane. The projection equations introduce some perspective effects while maintaining the parallel nature of the
projection lines.

The paraperspective projection of a point P(x,y,z) onto the XY plane can be calculated using the following
equations:

Given the point P(4,3,2) and d=5, we will use these equations to find the projected coordinates (x′,y′).

1. Calculate the scaling factor:

2. Calculate the projected x′-coordinate:

3. Calculate the projected y′-coordinate:


Example 2
Given a 3D point P(x,y,z) with coordinates P(6,8,10), find its paraperspective projection onto the XY plane
using a distance parameter d=10.

Solution

The paraperspective projection of a point P(x,y,z) onto the XY plane can be calculated using the following
equations:

Given the point P(6,8,10) and d=10, we will use these equations to find the projected coordinates (x′,y′).

Step-by-Step Calculation

1. Calculate the scaling factor:

2. Calculate the projected x′-coordinate:

3. Calculate the projected y′-coordinate:


Pin hole Camera Model
The Pinhole Camera Model is a simple and fundamental concept in computer vision and photography. It
describes how a 3D scene is projected onto a 2D image plane, capturing a view of the world through a small
aperture, or "pinhole," in a dark box.

Key Components of the Pinhole Camera Model

1. Pinhole:
o The pinhole is a small aperture through which light enters. It is effectively a point, meaning it
has no width, height, or depth.
o Light rays from objects in the scene pass through this pinhole and form an inverted image on the
opposite side of the camera (the image plane).

2. Image Plane:
o The image plane is the surface where the image is formed after the light passes through the
pinhole.
o In a real camera, this would be the film or sensor. In the model, it's a flat 2D plane.

3. Focal Length (f):


o The distance between the pinhole and the image plane is known as the focal length.
o It determines the scale of the image: a longer focal length results in a larger image, while a
shorter focal length produces a smaller image.

4. Coordinate Systems:
o World Coordinates (X, Y, Z): These represent the 3D positions of objects in the scene.
o Camera Coordinates: The origin of this coordinate system is at the pinhole, with the Z-axis
pointing outward through the pinhole, the X-axis horizontal, and the Y-axis vertical.
o Image Coordinates (u, v): These are the 2D coordinates on the image plane where the 3D point
is projected.

Mathematical Formulation

1. Coordinate Systems

 World Coordinate System (X, Y, Z): The coordinate system used to describe the position of objects in
the real world.
 Camera Coordinate System (X', Y', Z'): The coordinate system with its origin at the pinhole, and the
Z'-axis pointing in the direction of the camera's optical axis.
 Image Plane Coordinate System (u, v): The 2D coordinate system on the image plane where the 3D
points are projected.
2. Projection of 3D Points onto the Image Plane

Given a point (X,Y,Z) in the world coordinate system, and assuming the camera is positioned at the origin of
the world coordinate system, the point is first represented in the camera coordinate system (X′,Y′,Z′).

Camera Coordinate System

For simplicity, if the camera is aligned with the world coordinates, we have:

X′=X,Y′=Y,Z′=Z

Image Plane Projection

The pinhole camera model projects the 3D point onto the 2D image plane using the following projection
equations:

where:

 f is the focal length of the camera (the distance from the pinhole to the image plane).
 (u,v) are the coordinates of the projected point on the image plane.

3. Homogeneous Coordinates

To facilitate transformations, especially when dealing with translations and rotations, the model often uses
homogeneous coordinates.

3D Point in Homogeneous Coordinates

A 3D point (X,Y,Z) can be represented in homogeneous coordinates as:

2D Point in Homogeneous Coordinates

The corresponding 2D point on the image plane in homogeneous coordinates is:


4. Camera Projection Matrix

The relationship between the 3D point in world coordinates and its 2D projection on the image plane can be
expressed using a camera projection matrix P.

where:

 Kis the intrinsic matrix of the camera, containing the camera's internal parameters.
 R is the rotation matrix that aligns the world coordinates with the camera coordinates.
 t is the translation vector that describes the camera's position in the world coordinates.

Intrinsic Matrix K

The intrinsic matrix K accounts for the camera's internal characteristics:

where:

 fx and fy are the focal lengths in the x and y directions.


 (cx,cy)) is the principal point, which is typically the center of the image plane.

Extrinsic Matrix [R∣t]

The extrinsic matrix combines rotation and translation to map the world coordinates to the camera coordinates:

5. Full Projection Equation

Combining everything, the full projection equation for the pinhole camera model is:
6. Simplified Model (Orthographic Projection)

For cases where the distance Z is very large compared to the focal length, a simplified orthographic projection
can be used:

where Zavg is an average distance of the objects from the camera.

Example of the Pinhole Camera Model


Imagine you're photographing a tree using a pinhole camera.

 The tree is a 3D object located in front of the camera.


 Light from the top of the tree passes through the pinhole and strikes the bottom of the image plane,
while light from the bottom of the tree hits the top of the image plane.
 As a result, the image of the tree on the image plane is inverted.
Applications
The pinhole camera model is the basis for understanding more complex camera systems in computer vision. It's
used in:

 3D Reconstruction: Estimating the 3D structure of a scene from 2D images.


 Camera Calibration: Determining the internal parameters of a camera.
 Image Warping: Transforming images based on their projection properties.

This model is idealized, assuming a perfect pinhole and no lens distortions, but it's a powerful starting point for
understanding how cameras capture images.

Example 1:
A pinhole camera has a focal length of 50 mm. An object is located at (X,Y,Z)=(200 mm,100 mm,1000 mm) in
the world coordinate system. Find the coordinates of the projection of this object on the image plane (u,v).

Solution:

Given:

 Focal length f=50 mm


 Object coordinates (X,Y,Z)=(200 mm,100 mm,1000 mm)

The projection equations for the pinhole camera model are:

So, the coordinates of the projection on the image plane are (u,v)=(10 mm,5 mm)
Example 2:
An object appears on the image plane at coordinates (u,v)=(15 mm,10 mm). If the focal length of the camera is
75 mm, and the object is known to be 300 mm in the X direction from the camera center, find the distance Z of
the object from the camera along the Z-axis.

Solution

Given:

 Focal length f=75 mm


 Image coordinates (u,v)=(15 mm,10 mm)
 Object's X coordinate X=300 mm

We know from the projection equation:

Rearranging to solve for Z:

Substitute the given values:

So, the distance Z of the object from the camera is 1500 mm.

Example 3:
A square object of side 100 mm is located 800 mm away from a pinhole camera with a focal length of 40 mm.
What will be the size of the object's image on the image plane?
Solution:

Given:

 Focal length f=40 mm


 Object distance Z=800 mm
 Object size 100 mm×100 mm

To find the size of the image, we use the ratio of similar triangles formed by the object and its image:

So, the image size is:

So, the size of the object's image on the image plane will be 5 mm×5 mm.

Example 4
An object is located at (X,Y,Z)=(400 mm,200 mm,1200 mm). Find the image coordinates for this object when
the camera's focal length is 50 mm and 100mm.

Solution:

For f=50 mm:

For f=100 mm:


So, the image coordinates are:

 For f=50 mm: (u,v)=(16.67 mm,8.33 mm)


 For f=100 mm: (u,v)=(33.33 mm,16.67 mm)

Example 5:
A camera has a focal length f=35mm. An object is located at (X,Y,Z)=(200 mm,150 mm,1000 mm) in the world
coordinate system. Determine the coordinates (u,v) of the projection of this object on the image plane.

Solution:

Given:

 Focal length f=35


 Object coordinates in the world system (X,Y,Z)=(200 mm,150 mm,1000 mm)

Using the projection equations:

Substitute the values:


The coordinates of the projection on the image plane are (u,v)=(7 mm,5.25 mm).

Example 6:
An object is projected onto the image plane at coordinates (u,v)=(20 mm,10 mm). The camera's focal length is
f=50mm, and the object’s X-coordinate is known to be 400 mm. Find the distance Z of the object from the
camera.

Solution:

Given:

 Focal length f=50mm


 Image coordinates (u,v)=(20 mm,10 mm)
 Object's X-coordinate: X=400 mm

Using the equation for u:

Rearrange to find Z:

Substitute the values:

The distance Z of the object from the camera is 1000 mm.


Example 7:
A rectangular object measuring 100 mm×50 mm is placed 800 mm away from a camera with a focal length of
25 mm. Calculate the size of the object's image on the image plane.

Solution:

Given:

 Object size: 100 mm×50 mm


 Object distance Z=800 mm
 Focal length f=25 mm

Using the projection equation, the size of the image on the image plane is scaled by the factor f\Z:

For the width:

For the height:

The size of the image on the image plane is 3.125 mm×1.5625 mm.

Example 8:

Problem:
A camera has the following intrinsic parameters:

A point in the world is located at (X,Y,Z)=(400 mm,200 mm,1000 mm). Find the coordinates (u,v) on the image
plane.

Solution:
Given:

 Intrinsic matrix K
 World coordinates: (X,Y,Z)=(400 mm,200 mm,1000 mm)

Using the equation for projection:

First, compute the normalized coordinates:

Then multiply by the intrinsic matrix K:

This results in:

The coordinates on the image plane are (u,v)=(800,480).

Example 9
A camera has the following intrinsic matrix:
A 3D point in the world is located at (X,Y,Z)=(300 mm,200 mm,1500 mm). Determine the image coordinates
(u,v) of the projection of this point on the image plane.

Solution:

Given:

 Intrinsic matrix K
 World coordinates: (X,Y,Z)=(300 mm,200 mm,1500 mm)

First, normalize the 3D point:

Next, use the intrinsic matrix to project the point onto the image plane:

Calculate:

The image coordinates of the point are (u,v)=(520,373.3).

Example 10
Given the intrinsic matrix K:
and a point projected onto the image plane at (u,v)=(560,460). Assume the object lies on the plane Z=1000 mm.
Find the corresponding world coordinates (X,Y,Z) of the point.

Solution:

Given:

 Intrinsic matrix K
 Image coordinates (u,v)=(560,460)
 Distance Z=1000 mm

We can reconstruct the world coordinates using the inverse of the intrinsic matrix:

1. Convert the image coordinates back to normalized coordinates:

First, compute the inverse of K:

Now, apply this to the image coordinates:

Calculate:

Reconstruct the world coordinates:


The world coordinates of the point are (X,Y,Z)=(200 mm,200 mm,1000 mm).

Example 11
A camera has an intrinsic matrix given by:

If the focal length is doubled, and a 3D point in the world at (X,Y,Z)=(500 mm,300 mm,2000 mm)is projected
onto the image plane, what are the new image coordinates (u,v)?

Solution:

Given:

 Initial intrinsic matrix K


 World coordinates: (X,Y,Z)=(500 mm,300 mm,2000 mm)
 Focal length is doubled

If the focal length is doubled, the new intrinsic matrix K′ is:

First, normalize the 3D point:


Next, apply the new intrinsic matrix to find the image coordinates:

Calculate:

The new image coordinates of the point after doubling the focal length are (u,v)=(1390,930).

Example 12
Suppose a 3D point (X,Y,Z)=(100 mm,150 mm,1000 mm)is projected onto the image plane at (u,v)=(220,270).
The camera has a principal point at (cx,cy)=(200,250). Find the focal length f of the camera.

Solution:

Given:

 3D point: (X,Y,Z)=(100 mm,150 mm,1000 mm)


 Image coordinates: (u,v)=(220,270)
 Principal point: (cx,cy)=(200,250)

Using the projection equations:


Substitute the values into the equations:

Solving for f:

The first equation gives a focal length of 200 mm, and the second gives approximately 133.33 mm. The
inconsistency can arise due to noise or inaccuracies in measurement, so we could average them or examine
other factors to decide which is correct.

The focal length f is approximately between 133.33 mm and 200 mm.

Camera Intrinsic
The camera intrinsic matrix is a fundamental concept in computer vision and photogrammetry, essential for
understanding how a 3D point in the world is projected onto a 2D image plane. This matrix encapsulates the
internal parameters of a camera that relate the 3D coordinates of a point in space to the corresponding 2D pixel
coordinates in the image.

Key Components of the Intrinsic Matrix


The intrinsic matrix K is typically a 3x3 matrix that can be written as:
Where:

 fx and fy: The focal lengths of the camera in pixels along the x and y axes. These values represent the
scaling factors between the physical size of the sensor and the pixel dimensions.
 cx and cy: The coordinates of the principal point (the intersection of the optical axis with the image
plane) in pixels. These values indicate the offset of the image center from the origin of the pixel grid.
 The last row [0,0,1] is a homogeneous coordinate term used in projective geometry.

Mathematical Formulation

If we have a 3D point in the world coordinates Xw=[X,Y,Z,1]T, the corresponding point in the camera
coordinate system Xc=[Xc,Yc,Zc,1]T is found by applying an extrinsic transformation (rotation and translation).
The 2D pixel coordinates x=[u,v,1]T on the image plane are then obtained by projecting the 3D point using the
intrinsic matrix:

Where Zc is the depth or distance from the camera. This equation can be expanded to:

This results in:

Real-World Application

In practical terms, the intrinsic matrix is used in various applications:


 Camera Calibration: Determining the intrinsic parameters of a camera, which is a crucial step in many
computer vision tasks like 3D reconstruction, object tracking, and augmented reality.
 Image Rectification: Correcting image distortions caused by lens imperfections using the intrinsic
parameters.
 3D Reconstruction: Converting 2D images into 3D models by combining intrinsic and extrinsic
parameters.

Example 1
Given the intrinsic matrix:

and a 3D point in the camera coordinate system (Xc,Yc,Zc)=(200,150,1000) find the corresponding image
coordinates (u,v).

Solution:

1. Normalize the 3D point:

2.Apply the intrinsic matrix:


3.Compute the results:

The image coordinates are (u,v)=(880,660).

Example 2
Given a calibrated camera with an intrinsic matrix:

A point is observed in two images with pixel coordinates p1=(350,280) and p2=(450,280). Assume the camera's
motion between the two images is purely horizontal (along the x-axis) with a baseline distance of 100 units. The
Z-coordinate of the point in the first image is known to be 1000 units. Calculate the 3D coordinates of the point
in the camera's coordinate system.

Solution:

1. Unproject the points: First, we compute the normalized coordinates by subtracting the principal point
and dividing by the focal length:

For p1=(350,280):

For p2=(450,280):
2.Depth calculation: Since the camera motion is purely horizontal, we can compute the depth Z using the
disparity d=x2−x1 and the baseline B=100 units:

The depth Z is given by:

3.Reconstruct the 3D point:

Using the normalized coordinates from the first image, the 3D coordinates are:

Thus, the 3D coordinates of the point in the camera's coordinate system are:

Xc=(30000,40000,1000000) units

Example 3
Given the intrinsic matrix K.
Where fx and fy are the focal lengths along the x and y axes, cx and cy are the coordinates of the principal point,
and s is the skew factor (typically 0 for most cameras).

Assume you have the following 3D world coordinates and corresponding image coordinates (in pixels) obtained
from a pinhole camera:

Calculate the intrinsic parameters fx, fy, cx, and cy using these points.

Solution:

1. Write the projection equations for each point:

For the first point:

For the second point:

2. Solve the system of equations:

From the first equation:

From the second equation:


Substituting cx from the first equation into the second:

Simplifying this equation gives:

Substituting back to find cx:

For fy and cy, similarly solve:

After substitution and simplification:

Simplifying to find fy and cy:


Final Intrinsic Parameters:

Example 4
Given a camera with the intrinsic matrix:

A 3D point in the camera's coordinate system is (Xc,Yc,Zc)=(400,300,1000). Calculate the image coordinates
(u,v).

Solution:

1. Normalize the 3D point:


2. Apply the intrinsic matrix:

The image coordinates are (u,v)=(640,480).

Example 5
A calibrated camera has an intrinsic matrix:

A point in the image has pixel coordinates (u,v)=(750,550). Determine the direction of the corresponding ray in
the camera coordinate system (i.e., find the unit vector in the direction of the 3D point corresponding to this
pixel).

Solution:

1. Compute the normalized coordinates:


2. Form the direction vector:

The direction vector in the camera coordinate system is (xn,yn,1)=(0.125,0.125,1).

3. Normalize the direction vector:

Magnitude=

The direction of the corresponding ray is approximately (0.123,0.123,0.983) in the camera coordinate
system.

Example 6
You are given the following 3D world points and their corresponding 2D image points in pixels obtained from a
camera:

The intrinsic matrix K:


Determine the focal length f and the principal point (cx,cy).

Solution:

1. Write the projection equations for each point:

For the first point:

For the second point:

2.Solve for f and cx:

From the first equation:

From the second equation:


Simplify and solve for f:

Now substitute back to find cx:

3.Solve for cy:

Similarly, from the first v equation:

From the second v equation:

This result contradicts the given v2=500, indicating possible measurement errors.

Final Intrinsic Parameters:

Assuming the error was a typo, we conclude:


Example 7
Given a camera with the intrinsic matrix:

A 3D object is placed at Z=1000Z = 1000Z=1000 units from the camera. The image size of the object along the
x-axis is 100 pixels. Estimate the actual size of the object along the x-axis.

Solution:

1. Compute the normalized size in the image:

2. Calculate the actual size in the world: The actual size X is given by:

X=Normalized size×Z=0.125×1000=125 units

The actual size of the object along the x-axis is 125 units.

Image sensing pipeline


The image sensing pipeline, also known as the imaging pipeline or camera pipeline, is a sequence of processing
steps that convert raw data captured by a camera sensor into a final image. This process is crucial in digital
imaging systems, such as digital cameras, smartphones, and other imaging devices. The key stages in the image
sensing pipeline are shown in the figure below:
an overview of the key stages in the image sensing pipeline:

1. Photon Capture

 Light Capture: Light from a scene enters the camera lens and reaches the image sensor, which consists
of millions of photodiodes (pixels).

 Photon to Electron Conversion: The photodiodes convert incoming photons into electrical charges
(electrons). The amount of charge generated is proportional to the intensity of light hitting each pixel.

2. Analog Signal Processing

 Charge Accumulation: The electrons generated in each pixel are accumulated over the exposure time.

 Readout: The accumulated charge is read out from the pixels, usually row by row, to produce an analog
signal representing the light intensity for each pixel.

3. Analog-to-Digital Conversion (ADC)

 Quantization: The analog signal is converted into a digital signal by the Analog-to-Digital Converter
(ADC). This step quantizes the continuous analog signal into discrete digital values, typically 8-bit, 10-
bit, or 12-bit per channel.

4. Raw Image Formation

 Bayer Filter Pattern: Most image sensors use a Bayer filter array to capture color information. Each
pixel is filtered to record either red, green, or blue light, leading to a raw image where each pixel
contains data for only one color channel.

 Raw Image Output: The output is a raw image file (often in formats like RAW, DNG) that needs
further processing to produce a full-color image.

5. Demosaicing

 Interpolation: Since each pixel in the raw image contains only one color component, demosaicing is
performed to interpolate the missing color components at each pixel. This creates a full-color image by
estimating the missing red, green, or blue values based on the surrounding pixels.

6. White Balance

 Color Correction: White balance adjusts the colors in the image to ensure that whites appear white
under different lighting conditions. This step compensates for color temperature variations in the light
source.

 Gain Adjustment: The gains for the red, green, and blue channels are adjusted to achieve a neutral
color balance.

7. Noise Reduction

 Denoising Algorithms: Noise reduction algorithms are applied to reduce sensor noise, which can be
caused by low light, high ISO settings, or long exposure times. Techniques like temporal averaging,
spatial filtering, or wavelet-based denoising are used.
8. Color Correction

 Color Space Conversion: The image is converted from the camera's color space to a standard color
space (e.g., sRGB, Adobe RGB) to ensure accurate color reproduction.

 Gamma Correction: Gamma correction adjusts the image brightness by mapping the linear intensity
values to a nonlinear curve, making the image look more natural on display devices.

9. Image Enhancement

 Sharpening: Edge enhancement or sharpening filters are applied to increase the perceived sharpness of
the image.

 Contrast Adjustment: Contrast is adjusted to enhance the difference between light and dark areas of
the image.

 Saturation Adjustment: The saturation of colors can be increased or decreased to make the image
more vibrant or subdued.

10. Compression and Encoding

 Image Compression: The processed image is typically compressed to reduce file size. This can be done
using lossy compression (e.g., JPEG) or lossless compression (e.g., PNG).

 File Storage/Output: The final image is stored in a specific file format (e.g., JPEG, TIFF) or sent for
display or further processing.

11. Post-Processing (Optional)

 Additional Adjustments: In some cases, additional post-processing steps may be applied, such as
cropping, resizing, or applying special effects.

 Image Export: The final image is exported or shared in the desired format and resolution

Example 1:
A 4x4 pixel sensor with a Bayer filter array captures the following raw values:

The sensor follows the Bayer pattern:


Perform the demosaicing step to estimate the missing color values for each pixel.

Solution:

1. Assign the raw values to their respective color channels:

o Red channel (R):

Green channel (G):

Blue channel (B):

2. Demosaicing:

 For missing red values:


For missing green values:

For missing blue values:

Final Estimated Color Channels:

Red:
Green:

Blue:

Example 2:
An image sensor captures an image under a light source with a color temperature of 3500K. The RGB values
for a white object are recorded as (180,150,140). Adjust the white balance so that the object appears white (i.e.,
equal R, G, and B values).

Solution:

1. Calculate Gain Factors:

The white balance adjustment can be done by scaling the RGB values to equalize them. We need to find scale
factors rg, gg, and bg such that the scaled values become equal.
2. Apply Gain Factors:

The white-balanced RGB values are (157,156,157), which are close to neutral gray, indicating a
successful white balance adjustment.

Example 3
You capture an image in low light conditions, leading to noticeable noise. Assume you apply a 3x3 Gaussian
filter with the following kernel to reduce the noise:

Given the following pixel intensities in a noisy region:

Calculate the new intensity of the central pixel after applying the filter.
Solution:
1. Apply the filter to the central pixel:
The new intensity value at the center pixel (110) is calculated by taking the weighted sum of the surrounding
pixels, where the weights are given by the Gaussian kernel.

2.Compute the weighted sum:

3.Calculate the final value:

Example 4
A camera sensor captures an image under mixed lighting conditions, leading to a color cast. The RGB values
for a pixel are recorded as [200,150,100]. To correct the color, a color correction matrix is applied:

Compute the corrected RGB values.


Solution:
1. Matrix multiplication:
Multiply the color correction matrix by the original RGB values:

2.Calculate each component:


The corrected RGB values are [230,180,135], which should be more accurate to the true scene colors.

Example 5
A grayscale image is corrupted by salt-and-pepper noise. Consider the following 3x3 pixel neighborhood
around a pixel in the noisy image:

Apply a median filter to calculate the new intensity of the central pixel.
Solution:
1. List all the pixel values in the 3x3 neighborhood:
The pixel values are:
{255,0,255,0,125,255,255,0,0}
2. Sort the pixel values:
After sorting, the values are:
{0,0,0,0,125,255,255,255,255}
3. Find the median value:
The median value is the middle value in the sorted list. Since we have 9 values, the median is the 5th value:
Median=125

After applying the median filter, the new intensity of the central pixel is 125. This process helps in reducing
salt-and-pepper noise by replacing the central pixel with the median value of its neighborhood.

Example 6
Given an RGB image, the red channel has the following histogram distribution (simplified):
[0112121101]
Calculate the equalized histogram for the red channel.

Solution:
1. Compute the cumulative distribution function (CDF):
The cumulative sum of the histogram values is calculated as follows:
CDF=[0,1,2,4,5,7,8,9,9,10]
2. Normalize the CDF:
Normalize the CDF so that it ranges from 0 to 255 (assuming 8-bit grayscale):
Substituting the values:

This results in:


CDFnorm=[0,25.5,51,102,127.5,178.5,204,229.5,229.5,255]

3.Map the original histogram values to the equalized CDF values:

The equalized histogram values are:


Equalized Histogram=[0,25,51,102,127,178,204,229,229,255]
Result:
The equalized histogram will have a more uniform distribution, enhancing the contrast of the image, especially
in the red channel.

Example 7
An image is captured with a camera lens that introduces barrel distortion. The distortion is modeled by the
following equation for the distorted radius rd in terms of the undistorted radius ru:
rd=ru×(1+k1×ru2+k2×ru4)
Given k1=0.1 and k2=0.01, and an undistorted pixel position at (x u,yu)=(50,40), calculate the distorted position
(xd,yd).

Solution:
1. Calculate the undistorted radius ru:

2. Apply the distortion equation to find rd:


First, calculate the powers:

Now, substitute these into the distortion equation:

Simplify the expression:

3. Calculate the distorted position:


The distorted pixel position (xd,yd) is:

Substituting the values:

The distorted pixel position is approximately (8420550,6736440). The values indicate a significant
displacement, showing the impact of severe barrel distortion.

Example 8
A digital camera captures an image with a color cast due to incorrect white balance. The average RGB values
across the entire image are [180,140,120]. Adjust the white balance using the Grey World Assumption, which
assumes the average value of the colors should be the same.
Solution:
1. Calculate the scaling factors for each channel:
Assume the average RGB values should all equal the mean of the average values:

Calculate the scaling factors:

2.Apply the scaling factors to the image:


For each pixel in the image, multiply the R, G, and B channels by their respective scale factors.

Sampling and Aliasing


Sampling and aliasing are fundamental concepts in signal processing and are crucial to understanding image
processing and computer vision.

Sampling:
Definition:Sampling refers to the process of converting a continuous signal (such as a real-world image) into a
discrete signal (a digital image) by measuring the signal at regular intervals (sampling points). In the context of
images, sampling involves capturing the intensity of light at specific pixel locations to create a digital
representation of the scene.

Key Concepts:

 Sampling Rate: The number of samples (or pixels) taken per unit area (in images) or per unit time (in
signals).

 Nyquist Rate: The minimum sampling rate required to accurately capture a signal without introducing
errors. It should be at least twice the highest frequency present in the signal.

Example:Imagine you have an analog image of a scene. To digitize this image, you place a grid over it, and
each intersection of the grid lines becomes a pixel. The value assigned to each pixel is the average
color/intensity of the area it covers in the scene.
If the grid is too coarse (low sampling rate), fine details in the image might be lost. This leads us to the issue of
aliasing.

Aliasing:
Definition: Aliasing is an effect that occurs when a signal is sampled at a rate lower than the Nyquist rate,
resulting in a distortion where different signals become indistinguishable (or "aliased"). In images, this can
cause visual artifacts such as moiré patterns, jagged edges, and false patterns that do not exist in the original
scene.

Key Concepts:

 Under-Sampling: When the sampling rate is too low, high-frequency details are not captured correctly,
causing them to appear as lower frequencies (aliases).

 Moiré Patterns: A common aliasing artifact in images, where fine repetitive patterns (like a striped
shirt) appear to have strange, wavy patterns due to inadequate sampling.

 Anti-Aliasing: Techniques used to reduce or prevent aliasing. These might involve pre-filtering
(blurring) the image before sampling or using higher sampling rates.

Example: Consider photographing a striped pattern, like a tight grid on fabric. If the camera's sensor resolution
isn't high enough (i.e., the sampling rate is too low), the stripes might not appear correctly. Instead, you'll see a
moiré pattern, which is an interference pattern created by the interaction between the grid on the fabric and the
grid of pixels on the camera sensor.

Relationship Between Sampling and Aliasing


 Sampling is the process of converting a continuous signal into a discrete one. If done properly (at a
high enough rate), the digital representation will closely match the original. If not, the resulting digital
signal may include aliasing artifacts.

 Aliasing occurs when the sampling rate is insufficient to capture the signal's details accurately. To
avoid aliasing, it's important to sample at a rate higher than the Nyquist rate or to use anti-aliasing
techniques.

Real-World Application in Computer Vision


In computer vision, both sampling and aliasing have significant implications:

 Digital Image Capture: Cameras need to sample the scene at a high enough resolution to avoid
aliasing. If a camera with a low-resolution sensor captures a scene with fine details, aliasing can occur,
resulting in misleading visual artifacts.

 Image Resizing: When reducing the size of an image, if the sampling is done without considering
aliasing, the resized image might lose details or show artifacts. Anti-aliasing filters are often applied to
prevent these issues.
 Rendering in Graphics: When rendering 3D models or textures, aliasing can create jagged edges and
patterns. Techniques like supersampling and multisampling are used in graphics engines to mitigate
these effects

Note:

Sampling is about converting continuous data into discrete data by taking samples at regular intervals.

Aliasing is the distortion that happens when sampling is inadequate, leading to artifacts and false patterns.

Understanding both is crucial in fields like computer vision, where accurate representation and processing of
visual data are required.

Example 1:
Consider a 1D sinusoidal signal f(t)=sin(2π×10t), where t is time in seconds. The signal is sampled at a rate of
15 samples per second. Determine if aliasing will occur, and if so, what the aliased frequency will be.

Solution:

1. Determine the Original Frequency:

 The frequency of the original signal f(t)=sin(2π×10t) is 10 Hz.


2. Calculate the Sampling Rate:

 The sampling rate fs=15 samples per second (Hz).

3. Check the Nyquist Criterion:

 According to the Nyquist theorem, the sampling rate should be at least twice the highest frequency
present in the signal to avoid aliasing. Therefore, the Nyquist rate is 2×10=20Hz.

4. Compare the Sampling Rate with the Nyquist Rate:

 Since the sampling rate fs=15 Hz is less than the Nyquist rate of 20 Hz, aliasing will occur.

5. Determine the Aliased Frequency:

 The aliased frequency fa can be calculated using the formula:

fa=∣f−n×fs∣

where n is an integer that makes fa positive and less than fs/2.

Here, n=1, so:

fa=∣10−1×15∣=∣10−15∣=5 Hz

Aliasing will occur, and the signal will appear to have a frequency of 5 Hz instead of the original 10 Hz.
Example 2:
An image with a sinusoidal pattern of frequency 50 cycles per millimeter is captured by a digital camera with a
sensor that samples at a rate of 80 pixels per millimeter. Will aliasing occur? If so, determine the frequency of the
aliased pattern.

Solution:

1. Determine the Original Frequency:

o The frequency of the sinusoidal pattern is 50 cycles per millimeter.

2. Calculate the Sampling Rate:

o The camera samples at 80 pixels per millimeter.

3. Check the Nyquist Criterion:

o The Nyquist rate for sampling the sinusoidal pattern without aliasing is 2×50=100 pixels per
millimeter.

4. Compare the Sampling Rate with the Nyquist Rate:

o The sampling rate 80 pixels per millimeter is less than the Nyquist rate of 100 pixels per
millimeter, indicating that aliasing will occur.

5. Determine the Aliased Frequency:

o The aliased frequency fa in a 2D image is calculated using the formula:

fa=∣f−n×fs∣

where n is an integer that makes fa positive and less than fs/2.

1. For n=1:

fa=∣50−1×80∣=∣50−80∣=30 cycles per millimeter.

Example 3:
An image with a resolution of 1024x1024 pixels represents a checkerboard pattern with 64 black and 64 white
squares along each axis. The image is downsampled by a factor of 8 to a resolution of 128x128 pixels.
Determine if aliasing will occur and describe the appearance of the downsampled image.

Solution:

1. Determine the Original Pattern Frequency:

o The checkerboard pattern has 64 cycles across 1024 pixels, so the frequency of the pattern is
64/1024=0.0625 cycles per pixel.
2. Calculate the Sampling Rate After Downsampling:

o After downsampling by a factor of 8, the new resolution is 128x128 pixels. The sampling rate
becomes 1\8 of the original, which corresponds to 128 pixels covering the same image area.

o The new sampling frequency is 18×0.0625=0.5cycles per pixel.

3. Check for Aliasing:

o The Nyquist frequency for the new sampling rate is 2×0.5=1 cycles per pixel. Since the pattern's
frequency of 0.0625 cycles per pixel is well below the Nyquist frequency, no aliasing should
occur in this case.

4. Appearance of the Downsampled Image:

o The checkerboard pattern will remain visible, but it will be less detailed. Each square in the
checkerboard will now cover 8x8 pixels, so some of the fine details may blend together, but no
spurious patterns (aliasing) will appear.

No aliasing will occur, and the downsampled image will still represent the checkerboard pattern, but with
reduced detail.

Example 4:
Consider a square wave signal with a fundamental frequency of 5 Hz. The signal is sampled at 8 Hz. Determine
if aliasing will occur, and if so, what will be the observed aliased frequency.

Solution:

1. Square Wave Frequency Components:

o A square wave consists of a fundamental frequency (5 Hz in this case) and its odd harmonics (15
Hz, 25 Hz, etc.).

2. Sampling Rate:

o The sampling rate is 8 Hz, which is greater than twice the fundamental frequency (Nyquist rate
for the fundamental is 10 Hz), but not high enough to capture higher harmonics correctly.

3. Aliasing of Harmonics:

o The first harmonic (15 Hz) will alias because 15 Hz is greater than half the sampling rate (4 Hz).

o The aliased frequency for the 15 Hz component: fa=∣15−n×8∣=∣15−2×8∣=1 Hz

4. Observed Signal:
o The aliased signal will have components at 5 Hz (original fundamental) and 1 Hz (aliased
harmonic).

Result:

Aliasing occurs, and the observed signal will have frequencies at 5 Hz and 1 Hz.

Example 5:
An image with a 256x256 checkerboard pattern is sampled at a resolution of 128x128 pixels. The checkerboard
pattern consists of alternating black and white squares, with each square spanning 16 pixels in the original
image. Determine if aliasing will occur in the down sampled image and describe the resulting pattern.

Solution:

1. Original Frequency:

o The original frequency of the checkerboard pattern is 1\16 cycles per pixel.

2. New Sampling Rate:

o The new resolution is 128x128 pixels. Since each original square covered 16 pixels, after down
sampling, each square will now cover 8 pixels.

3. Nyquist Criterion:

o The Nyquist frequency after down sampling is 1\(2×8)=1\16 cycles per pixel, which matches the
original pattern's frequency.

4. Aliasing Check:

o Since the sampling rate is at the Nyquist limit, there is a risk of aliasing. Depending on the exact
phase of the checkerboard pattern relative to the sampling grid, the squares might appear
distorted or could blend together, causing artifacts.

Result:

Aliasing may occur, potentially causing visual distortions such as blurred or jagged edges in the checkerboard
pattern.

Example 6:
An audio signal consists of two tones, one at 300 Hz and the other at 1200 Hz. The signal is sampled at 1000
Hz. Determine the frequencies of the aliased tones.
Solution:

1. Nyquist Frequency:

o The Nyquist frequency for the given sampling rate is 1000\2=500 Hz.

2. Aliasing of the 300 Hz Tone:

o Since 300 Hz is less than the Nyquist frequency, it does not alias. The observed frequency is 300
Hz.

3. Aliasing of the 1200 Hz Tone:

o 1200 Hz is greater than the Nyquist frequency and will alias. The aliased frequency:
fa=∣1200−n×1000∣=∣1200−1×1000∣=200 Hz.

Result:

The signal will contain frequencies at 300 Hz (unaltered) and 200 Hz (aliased from 1200 Hz).

Example 7:
A 512x512 pixel image with a diagonal line running from the top-left to the bottom-right is rotated by 45
degrees and then resampled to a 256x256 resolution. Determine the potential aliasing effects and how the line
will appear in the resampled image.

Solution:

1. Image Rotation:

o Rotating the image by 45 degrees changes the orientation of the diagonal line. This alters the
spatial frequency of the line relative to the image axes.

2. Resampling:

o Down sampling the image to 256x256 reduces the effective sampling rate by half.

3. Aliasing Analysis:

o The original diagonal line had a spatial frequency of 1\512 cycles per pixel along each axis.
After rotation, the line's frequency components may no longer align with the pixel grid,
increasing the effective spatial frequency.

o Since the image is resampled to a lower resolution, the higher effective frequency of the rotated
line might exceed the Nyquist limit, leading to aliasing. This could cause the line to appear
jagged, or even introduce false patterns along the line.
4. Resulting Appearance:

o The diagonal line in the resampled image may appear jagged or broken, with potential aliasing
artifacts depending on the rotation and the downsampling process.

Example 8:
A video camera captures frames at 24 frames per second (fps) of a rotating fan blade with a frequency of 10
revolutions per second (rps). Will aliasing occur? If so, what will be the apparent speed and direction of the
blade in the video?

Solution:

1. Frame Capture Rate:

o The camera captures 24 frames per second.

2. Blade Rotation Speed:

o The fan blade rotates at 10 revolutions per second.

3. Nyquist Criterion:

o To avoid aliasing, the frame capture rate should be at least twice the rotation speed (20 fps).
Here, the capture rate is slightly above this but close enough to cause aliasing effects.

4. Calculate the Aliased Speed:

o Aliasing occurs because the rotation speed is close to the Nyquist limit. The apparent speed fa
can be calculated by: fa=∣10−n×24 |.

o Here, n=1, so: fa=∣10−1×24∣=∣10−24∣=14 rps (in the reverse direction)

5. Apparent Motion:
o The blade will appear to rotate at 14 rps in the opposite direction to the actual rotation.

Result:

Aliasing occurs, causing the fan blade to appear to rotate at 14 rps in the reverse direction.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy