CO1 Notes
CO1 Notes
flow, Bilinear Interpolant. 3D rotations, 3D to 2D projections :Orthography and para perspective, Pin hole
Camera Model, Camera Intrinsic ,Image sensing pipeline, samplin, and aliasing.
Computer vision is closely related to artificial intelligence (AI) and often uses AI techniques such as machine
learning to analyze and understand visual data. Machine learning algorithms are used to “train” a computer to
recognize patterns and features in visual data, such as edges, shapes and colors.
Once trained, the computer can use this knowledge to identify and classify objects in new images and videos.
The accuracy of these classifications can be improved over time through further training and exposure to more
data.
In addition to machine learning, computer vision may also use techniques such as deep learning, which involves
training artificial neural networks on large amounts of data to recognize patterns and features in a way that is
similar to how the human brain works.
Real-World Applications :
1. Healthcare: Medical Imaging
Application: Medical imaging involves techniques and processes used to create images of the human body
for clinical purposes, medical procedures, or medical science. Computer vision plays a significant role in
interpreting these images.
1. Image Segmentation: Identifying and isolating different tissues, organs, or abnormalities in medical
images (e.g., segmenting tumors in MRI scans).
2. Pattern Recognition: Detecting patterns that are indicative of certain diseases (e.g., recognizing the
structure of cancer cells).
3. 3D Reconstruction: Creating 3D models from 2D image slices, such as those from CT or MRI
scans.
4. Anomaly Detection: Identifying abnormalities or unusual structures that may indicate disease or
injury.
Benefits:
Application: Autonomous vehicles (self-driving cars) use computer vision to navigate and understand their
environment.
1. Object Detection: Identifying objects such as pedestrians, vehicles, road signs, and obstacles.
2. Lane Detection: Recognizing lane markings to keep the vehicle within its lane.
3. SLAM (Simultaneous Localization and Mapping): Creating and updating maps of the
environment while tracking the vehicle's location.
4. Depth Estimation: Determining the distance to various objects using stereoscopic vision or other
depth-sensing technologies.
Benefits:
Application: Surveillance systems use computer vision to monitor and analyze video footage for security
purposes.
Benefits:
Enhanced Security: Provides real-time monitoring and alerts for potential security breaches.
Efficiency: Reduces the need for constant human monitoring.
Scalability: Can monitor large areas and numerous video feeds simultaneously.
Application: Augmented reality overlays digital information on the real world, while video games often use
computer vision for immersive experiences.
Marker-based AR: Using specific patterns or images to trigger the display of digital content.
Markerless AR: Recognizing and understanding the environment without predefined markers (e.g.,
detecting surfaces and objects).
Gesture Recognition: Interpreting human gestures for interactive gaming experiences.
Object Tracking: Keeping track of objects and their movements in real-time for AR applications.
Benefits:
Application: Retail applications use computer vision for tasks such as automated checkout and inventory
management.
1. Object Recognition: Identifying products and items at checkout or during inventory checks.
2. Barcode Scanning: Reading barcodes to automate checkout processes.
3. Shelf Monitoring: Monitoring shelves to ensure products are stocked and correctly placed.
4. Customer Behavior Analysis: Analyzing customer movements and interactions to optimize store
layouts and marketing strategies.
Benefits:
These examples illustrate the broad applicability of computer vision in various industries, demonstrating its
potential to transform and improve different aspects of our daily lives.
1. Data limitations
Computer vision requires large amounts of data to train and test algorithms. This can be problematic in
situations where data is limited or sensitive, and may not be suitable for processing in the cloud.
Additionally, scaling up data processing can be expensive and may be constrained by hardware and
other resources.
2. Learning rate
Another challenge in computer vision is the time and resources required to train algorithms. While error
rates have decreased over time, they still occur, and it takes time for the computer to be trained to
recognize and classify objects and patterns in images. This process typically involves providing sets of
labeled images and comparing them to the predicted output label or recognition measurements and then
modifying the algorithm to correct any errors.
3. Hardwarerequirements
Computer vision algorithms are computationally demanding, requiring fast processing and optimized
memory architecture for quicker memory access. Properly configured hardware systems and software
algorithms are also necessary to ensure that image-processing applications can run smoothly
and efficiently.
4. Inherent complexity in the visual world
In the real world, subjects may be seen from various orientations and in myriad lighting conditions, and
there are an infinite number of possible scenes in a true vision system. This inherent complexity makes it
difficult to build a general-purpose “seeing machine” that can handle all possible visual scenarios.
Overall, these challenges highlight the fact that computer vision is a difficult and complex field, and that there is
still much work to be done in order to build machines that can see and understand the world in the same way
humans do.
2D and 3D transformation :
Transformation means changing some graphics into something else by applying rules. We can have various
types of transformations such as translation, scaling up or down, rotation, shearing, etc.When a transformation
takes place on a 2D plane, it is called 2D transformation.
2D and 3D transformations are mathematical operations that modify the position, orientation, or size of objects
in a graphical scene. The main difference between the two is that 2D transformations use two coordinates (X
and Y) to manipulate elements in two dimensions, while 3D transformations use three coordinates (X, Y, and Z)
to add a third dimension, depth.
i. Translation
ii. Scaling
iii. Rotation
iv. Reflection
v. Shearing
2D Transformations
2D transformations involve manipulating objects within a two-dimensional plane. Common 2D transformations
include translation, scaling, rotation, and shearing.
i. Translation
A translation moves an object to a different position on the screen. You can translate a point in 2D by adding
translation coordinate (tx, ty) to the original coordinate (X, Y) to get the new coordinate (X1, Y1).
From the above figure, you can write that –
X1= X + tx
Y1 = Y + ty
Matrix Representation:
Adds the translation distances (tx, ty) to the coordinates of the object.
Example 1:
Translate the point P(3,4) by tx=5 units along the x-axis and ty=−2 units along the y-axis.
Solution:
Matrix Multiplication:
Result:
Verification:
Example 2
Given a triangle with vertices ABC having the coordinates as A(2,2), B(10,2) and C(5,5). Translate the triangle
with shifting vectors dx=5 and dy=6
ii. Scaling
To change the size of an object, scaling transformation is used. In the scaling process, you either expand or
compress the dimensions of the object. Scaling can be achieved by multiplying the original coordinates of the
object with the scaling factor to get the desired result.
Let us assume that the original coordinates are (X, Y), the scaling factors are (S X, SY), and the produced
coordinates are (X1, Y1). This can be mathematically represented as shown below –
X1 = X . SX
and
Y1 = Y . SY
P1 = P . S
Where S is the scaling matrix. The scaling process is shown in the following figure
Matrix Representation:
If we provide values less than 1 to the scaling factor S, then we can reduce the size of the object. If we provide
values greater than 1, then we can increase the size of the object.
Example 1:
Scale the point P(2,3) by sx=4 along the x-axis and sy=5 along the y-axis.
Solution:
Verification:
Example 2: Scale the triangle with vertices at A(1,2), B(3,4), and C(5,1) by sx=2 along the x-axis and sy=3
along the y-axis.
iii.Rotation
In rotation, we rotate the object at particular angle θ (theta) from its origin. From the following figure, we can
see that the point P(X, Y) is located at angle φ from the horizontal X coordinate with distance r from the origin.
Let us suppose you want to rotate it at the angle θ. After rotating it to a new location, you will get a new point
P1(X1, Y1).
Using standard trigonometric the original coordinate of point P(X, Y) can be represented as –
Matrix Representation:
Example 1:
Rotate the point P(1,2) by 90 degrees counterclockwise around the origin.
Solution:
Rotation Matrix: The rotation matrix R for a 2D rotation by an angle θ is given by:
Verification:
Conclusion:
The point P(1,2) rotated by 90 degrees counterclockwise results in the point P′(−2,1).
iv.Reflection
Reflection is the mirror image of original object. In other words, we can say that it is a rotation operation with
180°. In reflection transformation, the size of the object does not change. The reflected object is always formed
on the other side of mirror.
Reflection on X-Axis:
Xnew = Xold
Ynew = -Yold
In Matrix form, the above reflection equations may be represented as-
For homogeneous coordinates, the above reflection matrix may be represented as a 3 x 3 matrix as-
Reflection on Y-Axis:
Xnew = -Xold
Ynew = Yold
For homogeneous coordinates, the above reflection matrix may be represented as a 3 x 3 matrix as-
Example 1:
Reflect the point P(3,4) across the x-axis and y-axis.
Solution:
1. Reflection Matrices: The reflection matrix Rx for a reflection across the x-axis is given by:
The reflection matrix Ry for a reflection across the y-axis is given by:
Apply the Reflection Across the x-axis: Multiply the reflection matrix Rx by the point P:
Matrix Multiplication for x-axis Reflection: Perform the matrix multiplication step by step:
Result of x-axis Reflection: The reflected point Px′ in homogeneous coordinates is
Converting back to Cartesian coordinates, the reflected point across the x-axis is Px′(3,−4).
Apply the Reflection Across the y-axis: Multiply the reflection matrix Ry by the point P:
Matrix Multiplication for y-axis Reflection: Perform the matrix multiplication step by step:
Converting back to Cartesian coordinates, the reflected point across the y-axis is Py′(−3,4).
Verification:
Thus, the reflected points are Px′(3,−4) and Py′(−3,4), confirming our results.
Example 2:
Given a triangle with coordinate points A(3, 4), B(6, 4), C(5, 6). Apply the reflection on the X axis and obtain
the new coordinates of the object.
Example 3:
Given a triangle with coordinate points A(3, 4), B(6, 4), C(5, 6). Apply the reflection on the Y axis and obtain
the new coordinates of the object.
v.Shearing
A transformation that slants the shape of an object is called the shear transformation. There are two shear
transformations X-Shear and Y-Shear. One shifts X coordinates values and other shifts Y coordinate values.
However; in both the cases only one coordinate changes its coordinates and other preserves its values. Shearing
is also termed as Skewing. X-Shear The X-Shear preserves the Y coordinate and changes are made to X
coordinates, which causes the vertical lines to tilt right or left as shown in below figure
Shearing in X Axis-
Ynew = Yold
For homogeneous coordinates, the above shearing matrix may be represented as a 3 x 3 matrix as-
Shearing in Y Axis-
Shearing in Y axis is achieved by using the following shearing equations-
Xnew = Xold
For homogeneous coordinates, the above shearing matrix may be represented as a 3 x 3 matrix as
Example 1:
Shear the point P(2,3) with a shear factor of kx=1.5 along the x-axis and ky=0.5 along the y-axis.
Solution:
1. Shearing Matrices: The shearing matrix Sx for a shear along the x-axis is given by:
The shearing matrix Sy for a shear along the y-axis is given by:
Matrix Multiplication for x-axis Shear: Perform the matrix multiplication step by step:
Converting back to Cartesian coordinates, the sheared point along the x-axis is P`x(6.5,3).
Apply the Shear Along the y-axis: Multiply the shearing matrix Sy by the point P:
Matrix Multiplication for y-axis Shear: Perform the matrix multiplication step by step:
Verification:
Thus, the sheared points are P'x(6.5, 3) and P'y(2, 4), confirming our results.
Conclusion:
The point P(2,3) sheared along the x-axis by a factor of kx=1.5 results in the point P'x(6.5, 3), and sheared along
the y-axis by a factor of ky=0.5 results in the point P'y(2, 4).
3D Transformations
3D transformations involve manipulating objects within a three-dimensional space.
i.Translation
Moving an object from one position to another in 3D space.
Matrix Representation:
Example:
Translate the point P(2,−1,3) by Δx=4, Δy=−3 and Δz=5
Translation Matrix: The translation matrix T for translating a point by Δx, Δy, and Δz is given by:
Verification:
Conclusion:
The point P(2,−1,3) translated by Δx=4, Δy=−3, and Δz=5 results in the point P′(6,−4,8).
ii.Scaling
Changing the size of an object in 3D space.
Matrix Representation
Example:
Scale the point P(2,3,4) by a factor of Sx=2, Sy=3, and Sz=4.
Solution
Scaling Matrix: The scaling matrix S for scaling a point by Sx, Sy, and Sz is given by:
Verification:
iii.Rotation
Rotating an object around one of the principal axes (x, y, or z).
Matrix Representation:
Solution:
1. Rotation Matrix: The rotation matrix Rz for rotating a point by an angle θ around the z-axis is given by:
iv.Reflection
Reflection in 3D involves flipping a point across a plane, creating a mirror image of the point with respect to
that plane. The most common planes of reflection in 3D are the xy-plane, yz-plane, and xz-plane. Each of these
planes has its own reflection matrix.
Reflection Matrices
1. Reflection across the xy-plane: The reflection matrix Rxy for reflecting across the xy-plane (which flips
the z-coordinate) is:
Reflection across the yz-plane: The reflection matrix Ryz for reflecting across the yz-plane (which flips the x-
coordinate) is:
Reflection across the xz-plane: The reflection matrix Rxz for reflecting across the xz-plane (which flips the y-
coordinate) is:
Example:
Reflect the point P(3,4,5) across each of these planes
Solution
Reflection across the xy-plane: Multiply the reflection matrix Rxy by the point P:
Reflection across the xz-plane: Multiply the reflection matrix Rxz by the point P:
Conclusion:
Matrix Representation
Example:
Apply a shear transformation to the point P(1,2,3) with shear factors shxy=1.5, shxz=−0.5, shyx=0.5, shyz=1.0,
shzx=−1.0, and shzy=0.75.
Solution
Shearing Matrix: The general shearing matrix S in 3D with shear factors shxy, shxz, shyx, shyz, shzx, and shzyis
given by:
For this problem, the shear factors are shxy=1.5, shxz=−0.5, shyx=0.5, shyz=1.0, shzx=−1.0, and shzy=0.75
Point in Homogeneous Coordinates: Convert the point P(1,2,3) to homogeneous coordinates:
A covector is a linear functional that maps a vector to a scalar. It belongs to the dual space of a vector space.
A vector is a mathematical object that has both a magnitude ( or length) and a direction.
In 2D Space: A vector v in 2D space can be represented as:
where vx and vy are the components of the vector along the x-axis and y-axis, respectively.
A scalar is a quantity that is fully described by a single numerical value and has no direction. Scalars are
used to represent magnitudes such as temperature, mass, speed, energy, and time. In contrast to vectors,
which have both magnitude and direction, scalars only have magnitude.
Examples of Scalars
Temperature: 25°C
Mass: 10 kg
Speed: 60 km/h
Energy: 100 Joules
Time: 5 seconds
The product of a vector v and a scalar k is a vector w whose components are scaled by k:
A vector spaceV is a collection of vectors that can be added together and multiplied (scaled) by scalars
while satisfying certain axioms (closure, associativity, identity, and inverses for addition and scalar
multiplication). Examples of vector spaces include Rn, where n is a positive integer, and each vector in
Rn has n components.
The dual space of a vector space V is a new vector space, usually denoted by V∗, consisting of all linear
functionals defined on V.
A linear functional is a function that takes a vector from the vector space V and maps it to a scalar,
while satisfying linearity properties. This means that for any vectors u,v∈V, and any scalars a,b∈R, a
linear functional ϕ∈V∗satisfies:
ϕ(au+bv)=aϕ(u)+bϕ(v)
A covector is another name for a linear functional. Thus, a covector maps a vector to a scalar in a linear
fashion.
Notation:
Example
Vector Space R2
Consider the vector space R2 (It is the set of all ordered pairs of real numbers. It is a two-dimensional vector
space over the field of real numbers R). A vector in R2 can be written as
The dual space of R2denoted(R2)*, consists of all linear functionals that map vectors in R2 to scalars. A covector
(linear functional) in (R2)∗ can be written as a row vector
ϕ=[ϕ1ϕ2]
Mapping
The covectorϕ maps the vector v to a scalar via the dot product:
Properties of Covectors:
In computer vision, covectors (or dual vectors) are used in several ways such as
Example:
Given a vector
in R2 and a covectorϕ=[4−1] in the dual space (R2)∗ find the scalar value obtained by
applying the covector ϕ to the vector v.
Solution
The covectorϕ is a linear functional that maps the vector v to a scalar via the dot product. The dot product of a
covectorϕ=[ϕ1 ϕ2] and a vector
ϕ1=4
ϕ2=−1
v1=2
v2=3
Stretch/Squash
Stretch and squash transformations are types of linear transformations applied to images, shapes, or objects
in computer vision to change their dimensions. These transformations are often used in animation, image
processing, and geometric manipulations.
Stretch Transformation
A stretch transformation scales an object more in one direction than in another. For instance, stretching an
object horizontally while keeping the vertical dimension unchanged.
2D Stretch Transformation
In 2D, a stretch transformation can be represented by a scaling matrix that scales differently along the x-axis
and y-axis.
where sx and sy are the scaling factors along the x and y directions, respectively.
Example:
Stretch a point P(2,3) by 2 along the x-axis and 1 along the y-axis.
2D Squash Transformation:
A squash transformation can also be represented by a scaling matrix, but with the scaling factors sx and sy being
less than 1 in the respective directions.
Example:
Squash a point P(4,5) by 0.5 along the x-axis and 1 along the y-axis
3D Scaling Matrix
where sx, sy, and sz are the scaling factors along the x, y, and z directions, respectively.
Example of 3D Stretch:
Stretch a point P(1,2,3) by 2 along the x-axis, 1 along the y-axis, and 0.5 along the z-axis.
Example of 3D Squash:
Squash a point P(2,4,6) by 0.5 along the x-axis, 0.25 along the y-axis, and 1 along the z-axis.
Impact on Axes:
• Stretch: The scaling factor is greater than 1 along one axis (elongates the object).
• Squash: The scaling factor is less than 1 along one axis (compresses the object).
Resulting Shape:
When a camera moves relative to a planar surface, the points on that surface appear to move in a certain way on
the image plane. This movement can be described mathematically, and understanding it is crucial for various
computer vision applications like motion estimation, 3D reconstruction, and image stabilization.
Definition: Planar surface flow describes the apparent motion of points on a flat, two-dimensional surface in
the 3D world as observed in the 2D image plane. This motion can be caused by either the movement of the
camera, the surface, or both.
Mathematical Formulation
Consider a 3D point P=(X,Y,Z) on a planar surface. When projected onto the 2D image plane, it has coordinates
p=(x,y).
The relationship between the 3D coordinates (X,Y,Z) and the 2D image coordinates (x,y) can be expressed as:
If the planar surface can be described by the equation Z=aX+bY+c, the coordinates (X,Y,Z) of any point on the
surface can be related to its image coordinates (x,y).
Camera Motion
Assume the camera undergoes a rigid motion characterized by a rotation matrix R and a translation vector t. The
new position of the point P after the motion is:
P′=RP+t
The corresponding image coordinates (x′,y′) can be derived using the same image formation model.
The displacement of the point in the image plane from (x,y) to (x′,y′) gives the flow vector (u,v):
u=x′−x
v=y′−y
Example:
Consider a planar surface Z=2 and a camera moving with a translation vector t=(0,0,1) and no rotation.
Compute the image flow for a point (X,Y,Z)=(1,1,2) with a focal length f=1000.
Solution
Using the image formation model, we project the 3D point (X,Y,Z)=(1,1,2) onto the image plane:
The camera undergoes a translation t=(0,0,1). The new Z coordinate after translation becomes:
The X and Y coordinates remain unchanged because the translation is only along the Z-axis:
X′=X=1
Y′=Y=1
The flow vectors (u,v) represent the displacement of the point in the image plane:
u=x′−x=333.33−500=−166.67
v=y′−y=333.33−500=−166.67
So, the flow vector for the point (X,Y,Z)=(1,1,2) with a focal length f=1000 is (−166.67,−166.67).
Note:
By considering the focal length f=1000 and the translation of the camera, we computed the planar surface flow
for a point initially at (X,Y,Z)=(1,1,2). The flow vector (−166.67,−166.67) represents the apparent motion of the
point on the image plane due to the camera's movement.
Example 2:
Problem Statement
Consider a planar surface Z=2 and a camera moving with a translation vector t=(0,0,1) and a rotation around the
y-axis by an angle θ=30∘. Compute the image flow for a point (X,Y,Z)=(1,1,2) with a focal length f=1000.
Solution
Using the image formation model, we project the 3D point (X,Y,Z)=(1,1,2) onto the image plane:
Translation
The camera undergoes a translation t=(0,0,1). The new Z coordinate after translation becomes:
Z′=Z+tz=2+1=3
The X and Y coordinates remain unchanged because the translation is only along the ZZZ-axis:
X′=X=1
Y′=Y=1
Rotation
The camera also undergoes a rotation around the y-axis by an angle θ=30∘. The rotation matrix R for a rotation
around the y-axis is:
Substituting θ=30∘
The flow vectors (u,v) represent the displacement of the point in the image plane:
u=x′−x=1128.64−500=628.64
v=y′−y=476.17−500=−23.83
So, the flow vector for the point (X,Y,Z)=(1,1,2) with a focal length f=1000 and a rotation around the y-axis by
30∘ is (628.64,−23.83).
Note:
By considering both the translation and rotation of the camera, we computed the planar surface flow for a point
initially at (X,Y,Z)=(1,1,2). The flow vector (628.64,−23.83) represents the apparent motion of the point on the
image plane due to the camera's movement and rotation. This type of calculation is essential in computer vision
tasks like motion estimation, 3D reconstruction, and image stabilization.
Description: Self-driving cars use planar surface flow to estimate the motion of the vehicle relative to the
road surface. By analyzing the flow of points on the road, the car can determine its speed, direction, and
detect obstacles.
Example: A self-driving car analyzes the flow of lane markings and the road surface to maintain its lane,
adjust speed, and make turns.
2. 3D Reconstruction
Description: Planar surface flow is used to reconstruct 3D scenes from 2D images. This is crucial for
creating realistic VR and AR environments where the geometry of the physical world needs to be accurately
represented.
Example: AR applications like furniture placement apps use 3D reconstruction to accurately place virtual
objects on real-world surfaces, allowing users to see how furniture would look in their home.
3. Image Stabilization
Description: Image stabilization algorithms use planar surface flow to detect and correct camera shake.
This results in smoother videos and sharper images, especially in handheld or moving shots.
Example: Smartphones and cameras use image stabilization to reduce the blurriness caused by hand
movements, providing clear and stable footage.
4. Robotics Navigation
Description: Autonomous robots use planar surface flow to navigate environments. By analyzing the
motion of the floor or other planar surfaces, robots can understand their movement and avoid obstacles.
Example: A robotic vacuum cleaner uses planar surface flow to map out a room, identify obstacles like
furniture, and efficiently clean the floor.
Application: Drones
Description: Drones use planar surface flow to create accurate maps and survey land. By analyzing the
flow of ground points, drones can generate detailed 3D models of terrain and structures.
Example: A drone surveying a construction site uses planar surface flow to create a 3D model of the area,
allowing for precise measurement and monitoring of progress.
6. Medical Imaging
Application: Endoscopy
Description: In medical imaging, planar surface flow can assist in navigating and mapping internal body
structures. This is especially useful in endoscopic procedures where accurate navigation is crucial.
Example: During an endoscopic examination, planar surface flow can help create a 3D map of the internal
surfaces being examined, aiding in accurate diagnosis and treatment.
7. Gesture Recognition
Description: Planar surface flow is used in gesture recognition systems to detect and interpret human
gestures. This allows for more intuitive and natural interaction with devices and applications.
Example: A smart TV uses gesture recognition to allow users to control the interface by waving their
hands, using planar surface flow to track the movement of the hands.
8. Sports Analytics
Description: Planar surface flow is used to analyze the motion of athletes during sports activities. This
helps in improving performance, understanding techniques, and preventing injuries.
Example: A sports analytics system uses planar surface flow to track the motion of a basketball player,
analyzing their movements to provide feedback on shooting techniques and defensive maneuvers.
Bilinear Interpolant
Interpolation:
Interpolation is a mathematical technique used to estimate unknown values that fall within the range of known
data points.
Bilinear interpolation is a method of estimating the value of a function at a given point based on the values at
nearby points on a rectangular grid. It is widely used in image processing, computer graphics, and geographic
information systems (GIS) to perform tasks such as image resizing, warping, and texture mapping. The method
extends linear interpolation by performing linear interpolation in two directions.
Formula
Given a rectangular grid with points (x0,y0), (x1,y0), (x0,y1), and (x1,y1) with corresponding function values
f(x0,y0), f(x1,y0), f(x0,y1), and f(x1,y1), the bilinear interpolant f(x,y) at a point (x,y) within the grid can be
calculated as follows:
Interpolate in the x-direction:
Note:
In the context of images, f(x,y) might represent the intensity or color value of the pixel located at
position (x,y).
Example:
Given the values of a function f(x,y) at four grid points:
f(x0,y0)=f(0,0)=1
f(x1,y0)=f(1,0)=2
f(x0,y1)=f(0,1)=3
f(x1,y1)=f(1,1)=4
Calculate the value of the function at the point (0.5,0.5) using bilinear interpolation.
Solution
Note:
The y values of 0 and 1 are fixed because they correspond to the known y-coordinates of the grid points.
We interpolate along the x-direction at these fixed y-values to get intermediate values.
Then, we interpolate along the y-direction using these intermediate values to get the final interpolated
value at the desired point.
Example
Given a 2x2 grid of known pixel values, estimate the value at a point (x,y) inside this grid.
f(1,1)=10
f(2,1)=20
f(1,2)=30
f(2,2)=40
Solution:
Here:
1. Identify the surrounding grid points: For a given point (x,y), identify the four nearest grid points. In
our example, to estimate the value at (1.5,1.5), the surrounding grid points are:
o (1,1)
o (2,1)
o (1,2)
o (2,2)
2. Interpolate along the x-direction: Compute intermediate values at the desired x-coordinate for the
known y-coordinates.
For y=1:
For y=2
Interpolate along the y-direction: Use the intermediate values obtained to compute the final value at the
desired (x,y) point.
Interpolate at x=1.5
Substitute the values obtained:
Example
Given a 2x2 grid of pixel values, estimate the RGB color value at point (1.5,1.5).
Solution:
Bilinear interpolation involves two linear interpolations in one direction and one linear interpolation in the
perpendicular direction. We will perform this separately for the red, green, and blue components.
For y=1:
Substituting values:
Green component interpolation:
Substituting values:
Substituting values:
For y=2:
Substituting values:
Substituting values:
f(1.5,2)=(85,95,105)
Now, we use the intermediate values obtained from the xxx-direction interpolation to find the final value at
(1.5,1.5).
Substituting values:
Substituting values:
f(1.5,1.5)=(55,65,75)
The estimated RGB color value at point (1.5,1.5) using bilinear interpolation is (55,65,75).
1. Image Resizing
Upscaling: When enlarging an image, bilinear interpolation is used to estimate the values of new pixels
that fall between the original pixels. This helps to create a smoother and more visually appealing
enlarged image.
Downscaling: When reducing the size of an image, bilinear interpolation helps in determining the pixel
values of the smaller image, ensuring that the reduced image maintains the overall appearance and
details of the original image.
2. Geometric Transformations
Rotation: During the rotation of an image, pixel values must be calculated for the new positions of the
pixels. Bilinear interpolation provides a smooth transition by averaging the values of the nearest pixels.
Translation: When shifting an image, bilinear interpolation helps in determining the new pixel values to
avoid a jagged or blocky appearance.
Shearing: For shearing transformations, bilinear interpolation calculates intermediate pixel values to
produce a more natural-looking result.
Image Warping: Bilinear interpolation is used to calculate the pixel values when an image is distorted
or transformed to align with another image or a geometric model. This is essential in applications like
image registration and stitching.
Morphing: In image morphing, where one image gradually transforms into another, bilinear
interpolation ensures smooth transitions between corresponding points in the images.
When applying a 2D texture to a 3D surface, bilinear interpolation is used to compute the texture color
for each pixel on the surface. This avoids the appearance of pixelation and creates a more realistic
texture effect.
Alpha Blending: When combining two images or adding overlays, bilinear interpolation helps in
blending pixel values smoothly, avoiding harsh edges and creating a seamless composite image.
Image Compositing: For tasks such as layering images or combining different elements into a single
image, bilinear interpolation ensures that the transitions between different layers are smooth.
Feature Extraction: When extracting features from images (e.g., edges, corners), the algorithms might
work on resized or transformed versions of the image. Bilinear interpolation ensures that the resized or
transformed images retain the necessary details for accurate feature detection.
Template Matching: In template matching, bilinear interpolation helps in resizing or rotating the
template to match different scales and orientations of the target object in the image.
Flow Field Calculation: In optical flow algorithms, which estimate the motion between consecutive
frames in a video, bilinear interpolation is used to compute intermediate pixel values to enhance the
accuracy of motion vectors.
8. Super-Resolution
Image Super-Resolution: When generating high-resolution images from low-resolution inputs, bilinear
interpolation is used as part of the process to estimate the high-resolution pixel values, often as a
preliminary step before applying more advanced algorithms.
3D to 2D projections: Orthography Projection and Parallel projection(para
perspective):
3D to 2D projections are essential in computer graphics, computer vision, and various fields requiring
visualization of three-dimensional objects on two-dimensional surfaces. Two common types of projections are
orthographic and parallel (paraperspective) projections.
Orthographic Projection
Orthographic projection is a type of parallel projection where the projection lines are perpendicular to the
projection plane. This means that objects are projected onto the plane without any perspective distortion,
preserving their relative dimensions and shapes.
Characteristics:
Projection lines are parallel and orthogonal (perpendicular) to the projection plane.
No perspective distortion: objects maintain their size and shape regardless of their distance from the
projection plane.
Typically used for technical drawings, engineering designs, and CAD applications.
Mathematical Representation:
For an object point P(x,y,z) in 3D space, its orthographic projection P′(x′,y′) on the 2D plane is given by:
x′=x
y′=y
Solution
Orthographic projection onto the XY, XZ, and YZ planes involves projecting the 3D coordinates onto each of
these planes by ignoring one of the coordinates in each case.
(x′,y′)=(x,y)=(3,4)
(x′,z′)=(x,z)=(3,5)
(y′,z′)=(y,z)=(4,5)
A(1,2,3)
B(4,5,6)
C(7,8,9)
D(2,3,1)
Find the orthographic projections of these vertices onto the XY, XZ, and YZ planes.
Solution
To find the orthographic projections onto the XY, XZ, and YZ planes, we will project each vertex onto the
respective planes by ignoring one of the coordinates.
A(1,2,3) → AXY(1,2)
B(4,5,6) → BXY(4,5)
C(7,8,9) → CXY(7,8)
D(2,3,1) → DXY(2,3)
A(1,2,3) → AXZ(1,3)
B(4,5,6) → BXZ(4,6)
C(7,8,9) → CXZ(7,9)
D(2,3,1) → DXZ(2,1)
A(1,2,3) → AYZ(2,3)
B(4,5,6) → BYZ(5,6)
C(7,8,9) → CYZ(8,9)
D(2,3,1) → DYZ(3,1)
Example 3
Problem Statement
A(1,1,1)
B(1,1,−1)
C(1,−1,1)
D(1,−1,−1)
E(−1,1,1)
F(−1,1,−1)
G(−1,−1,1)
H(−1,−1,−1)
Find the orthographic projections of these vertices onto the XY, XZ, and YZ planes.
Solution
A(1,1,1) → AXY(1,1)
B(1,1,−1) → BXY(1,1)
C(1,−1,1) → CXY(1,−1)
D(1,−1,−1) → DXY(1,−1)
E(−1,1,1) → EXY(−1,1)
F(−1,1,−1)→ FXY(−1,1)
G(−1,−1,1) → GXY(−1,−1)
H(−1,−1,−1) → HXY(−1,−1)
A(1,1,1) → AXZ(1,1)
B(1,1,−1) → BXZ(1,−1)
C(1,−1,1) → CXZ(1,1)
D(1,−1,−1) → DXZ(1,−1)
E(−1,1,1) → EXZ(−1,1)
F(−1,1,−1) → FXZ(−1,−1)
G(−1,−1,1) → GXZ(−1,1)
H(−1,−1,−1) → HXZ(−1,−1)
A(1,1,1) → AYZ(1,1)
B(1,1,−1) → BYZ(1,−1)
C(1,−1,1) → CYZ(−1,1)
D(1,−1,−1) → DYZ(−1,−1)
E(−1,1,1) → EYZ(1,1)
F(−1,1,−1) → FYZ(1,−1)
G(−1,−1,1) → GYZ(−1,1)
H(−1,−1,−1) → HYZ(−1,−1)
Characteristics:
Projection lines are parallel but not perpendicular to the projection plane.
Some perspective effects are present, depending on the distance of objects from the projection plane.
Often used in technical illustrations where some depth cues are needed without full perspective
distortion.
Mathematical Representation:
For a point P(x,y,z) in 3D space, its paraperspective projection P′(x′,y′) on the 2D plane can be represented by
adjusting the orthographic projection equations to include a scaling factor based on depth z:
Projection Matrix:
Solution
In a paraperspective projection, the projection lines are parallel but not necessarily orthogonal to the projection
plane. The projection equations introduce some perspective effects while maintaining the parallel nature of the
projection lines.
The paraperspective projection of a point P(x,y,z) onto the XY plane can be calculated using the following
equations:
Given the point P(4,3,2) and d=5, we will use these equations to find the projected coordinates (x′,y′).
Solution
The paraperspective projection of a point P(x,y,z) onto the XY plane can be calculated using the following
equations:
Given the point P(6,8,10) and d=10, we will use these equations to find the projected coordinates (x′,y′).
Step-by-Step Calculation
1. Pinhole:
o The pinhole is a small aperture through which light enters. It is effectively a point, meaning it
has no width, height, or depth.
o Light rays from objects in the scene pass through this pinhole and form an inverted image on the
opposite side of the camera (the image plane).
2. Image Plane:
o The image plane is the surface where the image is formed after the light passes through the
pinhole.
o In a real camera, this would be the film or sensor. In the model, it's a flat 2D plane.
4. Coordinate Systems:
o World Coordinates (X, Y, Z): These represent the 3D positions of objects in the scene.
o Camera Coordinates: The origin of this coordinate system is at the pinhole, with the Z-axis
pointing outward through the pinhole, the X-axis horizontal, and the Y-axis vertical.
o Image Coordinates (u, v): These are the 2D coordinates on the image plane where the 3D point
is projected.
Mathematical Formulation
1. Coordinate Systems
World Coordinate System (X, Y, Z): The coordinate system used to describe the position of objects in
the real world.
Camera Coordinate System (X', Y', Z'): The coordinate system with its origin at the pinhole, and the
Z'-axis pointing in the direction of the camera's optical axis.
Image Plane Coordinate System (u, v): The 2D coordinate system on the image plane where the 3D
points are projected.
2. Projection of 3D Points onto the Image Plane
Given a point (X,Y,Z) in the world coordinate system, and assuming the camera is positioned at the origin of
the world coordinate system, the point is first represented in the camera coordinate system (X′,Y′,Z′).
For simplicity, if the camera is aligned with the world coordinates, we have:
X′=X,Y′=Y,Z′=Z
The pinhole camera model projects the 3D point onto the 2D image plane using the following projection
equations:
where:
f is the focal length of the camera (the distance from the pinhole to the image plane).
(u,v) are the coordinates of the projected point on the image plane.
3. Homogeneous Coordinates
To facilitate transformations, especially when dealing with translations and rotations, the model often uses
homogeneous coordinates.
The relationship between the 3D point in world coordinates and its 2D projection on the image plane can be
expressed using a camera projection matrix P.
where:
Kis the intrinsic matrix of the camera, containing the camera's internal parameters.
R is the rotation matrix that aligns the world coordinates with the camera coordinates.
t is the translation vector that describes the camera's position in the world coordinates.
Intrinsic Matrix K
where:
The extrinsic matrix combines rotation and translation to map the world coordinates to the camera coordinates:
Combining everything, the full projection equation for the pinhole camera model is:
6. Simplified Model (Orthographic Projection)
For cases where the distance Z is very large compared to the focal length, a simplified orthographic projection
can be used:
This model is idealized, assuming a perfect pinhole and no lens distortions, but it's a powerful starting point for
understanding how cameras capture images.
Example 1:
A pinhole camera has a focal length of 50 mm. An object is located at (X,Y,Z)=(200 mm,100 mm,1000 mm) in
the world coordinate system. Find the coordinates of the projection of this object on the image plane (u,v).
Solution:
Given:
So, the coordinates of the projection on the image plane are (u,v)=(10 mm,5 mm)
Example 2:
An object appears on the image plane at coordinates (u,v)=(15 mm,10 mm). If the focal length of the camera is
75 mm, and the object is known to be 300 mm in the X direction from the camera center, find the distance Z of
the object from the camera along the Z-axis.
Solution
Given:
So, the distance Z of the object from the camera is 1500 mm.
Example 3:
A square object of side 100 mm is located 800 mm away from a pinhole camera with a focal length of 40 mm.
What will be the size of the object's image on the image plane?
Solution:
Given:
To find the size of the image, we use the ratio of similar triangles formed by the object and its image:
So, the size of the object's image on the image plane will be 5 mm×5 mm.
Example 4
An object is located at (X,Y,Z)=(400 mm,200 mm,1200 mm). Find the image coordinates for this object when
the camera's focal length is 50 mm and 100mm.
Solution:
Example 5:
A camera has a focal length f=35mm. An object is located at (X,Y,Z)=(200 mm,150 mm,1000 mm) in the world
coordinate system. Determine the coordinates (u,v) of the projection of this object on the image plane.
Solution:
Given:
Example 6:
An object is projected onto the image plane at coordinates (u,v)=(20 mm,10 mm). The camera's focal length is
f=50mm, and the object’s X-coordinate is known to be 400 mm. Find the distance Z of the object from the
camera.
Solution:
Given:
Rearrange to find Z:
Solution:
Given:
Using the projection equation, the size of the image on the image plane is scaled by the factor f\Z:
The size of the image on the image plane is 3.125 mm×1.5625 mm.
Example 8:
Problem:
A camera has the following intrinsic parameters:
A point in the world is located at (X,Y,Z)=(400 mm,200 mm,1000 mm). Find the coordinates (u,v) on the image
plane.
Solution:
Given:
Intrinsic matrix K
World coordinates: (X,Y,Z)=(400 mm,200 mm,1000 mm)
Example 9
A camera has the following intrinsic matrix:
A 3D point in the world is located at (X,Y,Z)=(300 mm,200 mm,1500 mm). Determine the image coordinates
(u,v) of the projection of this point on the image plane.
Solution:
Given:
Intrinsic matrix K
World coordinates: (X,Y,Z)=(300 mm,200 mm,1500 mm)
Next, use the intrinsic matrix to project the point onto the image plane:
Calculate:
Example 10
Given the intrinsic matrix K:
and a point projected onto the image plane at (u,v)=(560,460). Assume the object lies on the plane Z=1000 mm.
Find the corresponding world coordinates (X,Y,Z) of the point.
Solution:
Given:
Intrinsic matrix K
Image coordinates (u,v)=(560,460)
Distance Z=1000 mm
We can reconstruct the world coordinates using the inverse of the intrinsic matrix:
Calculate:
Example 11
A camera has an intrinsic matrix given by:
If the focal length is doubled, and a 3D point in the world at (X,Y,Z)=(500 mm,300 mm,2000 mm)is projected
onto the image plane, what are the new image coordinates (u,v)?
Solution:
Given:
Calculate:
The new image coordinates of the point after doubling the focal length are (u,v)=(1390,930).
Example 12
Suppose a 3D point (X,Y,Z)=(100 mm,150 mm,1000 mm)is projected onto the image plane at (u,v)=(220,270).
The camera has a principal point at (cx,cy)=(200,250). Find the focal length f of the camera.
Solution:
Given:
Solving for f:
The first equation gives a focal length of 200 mm, and the second gives approximately 133.33 mm. The
inconsistency can arise due to noise or inaccuracies in measurement, so we could average them or examine
other factors to decide which is correct.
Camera Intrinsic
The camera intrinsic matrix is a fundamental concept in computer vision and photogrammetry, essential for
understanding how a 3D point in the world is projected onto a 2D image plane. This matrix encapsulates the
internal parameters of a camera that relate the 3D coordinates of a point in space to the corresponding 2D pixel
coordinates in the image.
fx and fy: The focal lengths of the camera in pixels along the x and y axes. These values represent the
scaling factors between the physical size of the sensor and the pixel dimensions.
cx and cy: The coordinates of the principal point (the intersection of the optical axis with the image
plane) in pixels. These values indicate the offset of the image center from the origin of the pixel grid.
The last row [0,0,1] is a homogeneous coordinate term used in projective geometry.
Mathematical Formulation
If we have a 3D point in the world coordinates Xw=[X,Y,Z,1]T, the corresponding point in the camera
coordinate system Xc=[Xc,Yc,Zc,1]T is found by applying an extrinsic transformation (rotation and translation).
The 2D pixel coordinates x=[u,v,1]T on the image plane are then obtained by projecting the 3D point using the
intrinsic matrix:
Where Zc is the depth or distance from the camera. This equation can be expanded to:
Real-World Application
Example 1
Given the intrinsic matrix:
and a 3D point in the camera coordinate system (Xc,Yc,Zc)=(200,150,1000) find the corresponding image
coordinates (u,v).
Solution:
Example 2
Given a calibrated camera with an intrinsic matrix:
A point is observed in two images with pixel coordinates p1=(350,280) and p2=(450,280). Assume the camera's
motion between the two images is purely horizontal (along the x-axis) with a baseline distance of 100 units. The
Z-coordinate of the point in the first image is known to be 1000 units. Calculate the 3D coordinates of the point
in the camera's coordinate system.
Solution:
1. Unproject the points: First, we compute the normalized coordinates by subtracting the principal point
and dividing by the focal length:
For p1=(350,280):
For p2=(450,280):
2.Depth calculation: Since the camera motion is purely horizontal, we can compute the depth Z using the
disparity d=x2−x1 and the baseline B=100 units:
Using the normalized coordinates from the first image, the 3D coordinates are:
Thus, the 3D coordinates of the point in the camera's coordinate system are:
Xc=(30000,40000,1000000) units
Example 3
Given the intrinsic matrix K.
Where fx and fy are the focal lengths along the x and y axes, cx and cy are the coordinates of the principal point,
and s is the skew factor (typically 0 for most cameras).
Assume you have the following 3D world coordinates and corresponding image coordinates (in pixels) obtained
from a pinhole camera:
Calculate the intrinsic parameters fx, fy, cx, and cy using these points.
Solution:
Example 4
Given a camera with the intrinsic matrix:
A 3D point in the camera's coordinate system is (Xc,Yc,Zc)=(400,300,1000). Calculate the image coordinates
(u,v).
Solution:
Example 5
A calibrated camera has an intrinsic matrix:
A point in the image has pixel coordinates (u,v)=(750,550). Determine the direction of the corresponding ray in
the camera coordinate system (i.e., find the unit vector in the direction of the 3D point corresponding to this
pixel).
Solution:
Magnitude=
The direction of the corresponding ray is approximately (0.123,0.123,0.983) in the camera coordinate
system.
Example 6
You are given the following 3D world points and their corresponding 2D image points in pixels obtained from a
camera:
Solution:
This result contradicts the given v2=500, indicating possible measurement errors.
A 3D object is placed at Z=1000Z = 1000Z=1000 units from the camera. The image size of the object along the
x-axis is 100 pixels. Estimate the actual size of the object along the x-axis.
Solution:
2. Calculate the actual size in the world: The actual size X is given by:
The actual size of the object along the x-axis is 125 units.
1. Photon Capture
Light Capture: Light from a scene enters the camera lens and reaches the image sensor, which consists
of millions of photodiodes (pixels).
Photon to Electron Conversion: The photodiodes convert incoming photons into electrical charges
(electrons). The amount of charge generated is proportional to the intensity of light hitting each pixel.
Charge Accumulation: The electrons generated in each pixel are accumulated over the exposure time.
Readout: The accumulated charge is read out from the pixels, usually row by row, to produce an analog
signal representing the light intensity for each pixel.
Quantization: The analog signal is converted into a digital signal by the Analog-to-Digital Converter
(ADC). This step quantizes the continuous analog signal into discrete digital values, typically 8-bit, 10-
bit, or 12-bit per channel.
Bayer Filter Pattern: Most image sensors use a Bayer filter array to capture color information. Each
pixel is filtered to record either red, green, or blue light, leading to a raw image where each pixel
contains data for only one color channel.
Raw Image Output: The output is a raw image file (often in formats like RAW, DNG) that needs
further processing to produce a full-color image.
5. Demosaicing
Interpolation: Since each pixel in the raw image contains only one color component, demosaicing is
performed to interpolate the missing color components at each pixel. This creates a full-color image by
estimating the missing red, green, or blue values based on the surrounding pixels.
6. White Balance
Color Correction: White balance adjusts the colors in the image to ensure that whites appear white
under different lighting conditions. This step compensates for color temperature variations in the light
source.
Gain Adjustment: The gains for the red, green, and blue channels are adjusted to achieve a neutral
color balance.
7. Noise Reduction
Denoising Algorithms: Noise reduction algorithms are applied to reduce sensor noise, which can be
caused by low light, high ISO settings, or long exposure times. Techniques like temporal averaging,
spatial filtering, or wavelet-based denoising are used.
8. Color Correction
Color Space Conversion: The image is converted from the camera's color space to a standard color
space (e.g., sRGB, Adobe RGB) to ensure accurate color reproduction.
Gamma Correction: Gamma correction adjusts the image brightness by mapping the linear intensity
values to a nonlinear curve, making the image look more natural on display devices.
9. Image Enhancement
Sharpening: Edge enhancement or sharpening filters are applied to increase the perceived sharpness of
the image.
Contrast Adjustment: Contrast is adjusted to enhance the difference between light and dark areas of
the image.
Saturation Adjustment: The saturation of colors can be increased or decreased to make the image
more vibrant or subdued.
Image Compression: The processed image is typically compressed to reduce file size. This can be done
using lossy compression (e.g., JPEG) or lossless compression (e.g., PNG).
File Storage/Output: The final image is stored in a specific file format (e.g., JPEG, TIFF) or sent for
display or further processing.
Additional Adjustments: In some cases, additional post-processing steps may be applied, such as
cropping, resizing, or applying special effects.
Image Export: The final image is exported or shared in the desired format and resolution
Example 1:
A 4x4 pixel sensor with a Bayer filter array captures the following raw values:
Solution:
2. Demosaicing:
Red:
Green:
Blue:
Example 2:
An image sensor captures an image under a light source with a color temperature of 3500K. The RGB values
for a white object are recorded as (180,150,140). Adjust the white balance so that the object appears white (i.e.,
equal R, G, and B values).
Solution:
The white balance adjustment can be done by scaling the RGB values to equalize them. We need to find scale
factors rg, gg, and bg such that the scaled values become equal.
2. Apply Gain Factors:
The white-balanced RGB values are (157,156,157), which are close to neutral gray, indicating a
successful white balance adjustment.
Example 3
You capture an image in low light conditions, leading to noticeable noise. Assume you apply a 3x3 Gaussian
filter with the following kernel to reduce the noise:
Calculate the new intensity of the central pixel after applying the filter.
Solution:
1. Apply the filter to the central pixel:
The new intensity value at the center pixel (110) is calculated by taking the weighted sum of the surrounding
pixels, where the weights are given by the Gaussian kernel.
Example 4
A camera sensor captures an image under mixed lighting conditions, leading to a color cast. The RGB values
for a pixel are recorded as [200,150,100]. To correct the color, a color correction matrix is applied:
Example 5
A grayscale image is corrupted by salt-and-pepper noise. Consider the following 3x3 pixel neighborhood
around a pixel in the noisy image:
Apply a median filter to calculate the new intensity of the central pixel.
Solution:
1. List all the pixel values in the 3x3 neighborhood:
The pixel values are:
{255,0,255,0,125,255,255,0,0}
2. Sort the pixel values:
After sorting, the values are:
{0,0,0,0,125,255,255,255,255}
3. Find the median value:
The median value is the middle value in the sorted list. Since we have 9 values, the median is the 5th value:
Median=125
After applying the median filter, the new intensity of the central pixel is 125. This process helps in reducing
salt-and-pepper noise by replacing the central pixel with the median value of its neighborhood.
Example 6
Given an RGB image, the red channel has the following histogram distribution (simplified):
[0112121101]
Calculate the equalized histogram for the red channel.
Solution:
1. Compute the cumulative distribution function (CDF):
The cumulative sum of the histogram values is calculated as follows:
CDF=[0,1,2,4,5,7,8,9,9,10]
2. Normalize the CDF:
Normalize the CDF so that it ranges from 0 to 255 (assuming 8-bit grayscale):
Substituting the values:
Example 7
An image is captured with a camera lens that introduces barrel distortion. The distortion is modeled by the
following equation for the distorted radius rd in terms of the undistorted radius ru:
rd=ru×(1+k1×ru2+k2×ru4)
Given k1=0.1 and k2=0.01, and an undistorted pixel position at (x u,yu)=(50,40), calculate the distorted position
(xd,yd).
Solution:
1. Calculate the undistorted radius ru:
The distorted pixel position is approximately (8420550,6736440). The values indicate a significant
displacement, showing the impact of severe barrel distortion.
Example 8
A digital camera captures an image with a color cast due to incorrect white balance. The average RGB values
across the entire image are [180,140,120]. Adjust the white balance using the Grey World Assumption, which
assumes the average value of the colors should be the same.
Solution:
1. Calculate the scaling factors for each channel:
Assume the average RGB values should all equal the mean of the average values:
Sampling:
Definition:Sampling refers to the process of converting a continuous signal (such as a real-world image) into a
discrete signal (a digital image) by measuring the signal at regular intervals (sampling points). In the context of
images, sampling involves capturing the intensity of light at specific pixel locations to create a digital
representation of the scene.
Key Concepts:
Sampling Rate: The number of samples (or pixels) taken per unit area (in images) or per unit time (in
signals).
Nyquist Rate: The minimum sampling rate required to accurately capture a signal without introducing
errors. It should be at least twice the highest frequency present in the signal.
Example:Imagine you have an analog image of a scene. To digitize this image, you place a grid over it, and
each intersection of the grid lines becomes a pixel. The value assigned to each pixel is the average
color/intensity of the area it covers in the scene.
If the grid is too coarse (low sampling rate), fine details in the image might be lost. This leads us to the issue of
aliasing.
Aliasing:
Definition: Aliasing is an effect that occurs when a signal is sampled at a rate lower than the Nyquist rate,
resulting in a distortion where different signals become indistinguishable (or "aliased"). In images, this can
cause visual artifacts such as moiré patterns, jagged edges, and false patterns that do not exist in the original
scene.
Key Concepts:
Under-Sampling: When the sampling rate is too low, high-frequency details are not captured correctly,
causing them to appear as lower frequencies (aliases).
Moiré Patterns: A common aliasing artifact in images, where fine repetitive patterns (like a striped
shirt) appear to have strange, wavy patterns due to inadequate sampling.
Anti-Aliasing: Techniques used to reduce or prevent aliasing. These might involve pre-filtering
(blurring) the image before sampling or using higher sampling rates.
Example: Consider photographing a striped pattern, like a tight grid on fabric. If the camera's sensor resolution
isn't high enough (i.e., the sampling rate is too low), the stripes might not appear correctly. Instead, you'll see a
moiré pattern, which is an interference pattern created by the interaction between the grid on the fabric and the
grid of pixels on the camera sensor.
Aliasing occurs when the sampling rate is insufficient to capture the signal's details accurately. To
avoid aliasing, it's important to sample at a rate higher than the Nyquist rate or to use anti-aliasing
techniques.
Digital Image Capture: Cameras need to sample the scene at a high enough resolution to avoid
aliasing. If a camera with a low-resolution sensor captures a scene with fine details, aliasing can occur,
resulting in misleading visual artifacts.
Image Resizing: When reducing the size of an image, if the sampling is done without considering
aliasing, the resized image might lose details or show artifacts. Anti-aliasing filters are often applied to
prevent these issues.
Rendering in Graphics: When rendering 3D models or textures, aliasing can create jagged edges and
patterns. Techniques like supersampling and multisampling are used in graphics engines to mitigate
these effects
Note:
Sampling is about converting continuous data into discrete data by taking samples at regular intervals.
Aliasing is the distortion that happens when sampling is inadequate, leading to artifacts and false patterns.
Understanding both is crucial in fields like computer vision, where accurate representation and processing of
visual data are required.
Example 1:
Consider a 1D sinusoidal signal f(t)=sin(2π×10t), where t is time in seconds. The signal is sampled at a rate of
15 samples per second. Determine if aliasing will occur, and if so, what the aliased frequency will be.
Solution:
According to the Nyquist theorem, the sampling rate should be at least twice the highest frequency
present in the signal to avoid aliasing. Therefore, the Nyquist rate is 2×10=20Hz.
Since the sampling rate fs=15 Hz is less than the Nyquist rate of 20 Hz, aliasing will occur.
fa=∣f−n×fs∣
fa=∣10−1×15∣=∣10−15∣=5 Hz
Aliasing will occur, and the signal will appear to have a frequency of 5 Hz instead of the original 10 Hz.
Example 2:
An image with a sinusoidal pattern of frequency 50 cycles per millimeter is captured by a digital camera with a
sensor that samples at a rate of 80 pixels per millimeter. Will aliasing occur? If so, determine the frequency of the
aliased pattern.
Solution:
o The Nyquist rate for sampling the sinusoidal pattern without aliasing is 2×50=100 pixels per
millimeter.
o The sampling rate 80 pixels per millimeter is less than the Nyquist rate of 100 pixels per
millimeter, indicating that aliasing will occur.
fa=∣f−n×fs∣
1. For n=1:
Example 3:
An image with a resolution of 1024x1024 pixels represents a checkerboard pattern with 64 black and 64 white
squares along each axis. The image is downsampled by a factor of 8 to a resolution of 128x128 pixels.
Determine if aliasing will occur and describe the appearance of the downsampled image.
Solution:
o The checkerboard pattern has 64 cycles across 1024 pixels, so the frequency of the pattern is
64/1024=0.0625 cycles per pixel.
2. Calculate the Sampling Rate After Downsampling:
o After downsampling by a factor of 8, the new resolution is 128x128 pixels. The sampling rate
becomes 1\8 of the original, which corresponds to 128 pixels covering the same image area.
o The Nyquist frequency for the new sampling rate is 2×0.5=1 cycles per pixel. Since the pattern's
frequency of 0.0625 cycles per pixel is well below the Nyquist frequency, no aliasing should
occur in this case.
o The checkerboard pattern will remain visible, but it will be less detailed. Each square in the
checkerboard will now cover 8x8 pixels, so some of the fine details may blend together, but no
spurious patterns (aliasing) will appear.
No aliasing will occur, and the downsampled image will still represent the checkerboard pattern, but with
reduced detail.
Example 4:
Consider a square wave signal with a fundamental frequency of 5 Hz. The signal is sampled at 8 Hz. Determine
if aliasing will occur, and if so, what will be the observed aliased frequency.
Solution:
o A square wave consists of a fundamental frequency (5 Hz in this case) and its odd harmonics (15
Hz, 25 Hz, etc.).
2. Sampling Rate:
o The sampling rate is 8 Hz, which is greater than twice the fundamental frequency (Nyquist rate
for the fundamental is 10 Hz), but not high enough to capture higher harmonics correctly.
3. Aliasing of Harmonics:
o The first harmonic (15 Hz) will alias because 15 Hz is greater than half the sampling rate (4 Hz).
4. Observed Signal:
o The aliased signal will have components at 5 Hz (original fundamental) and 1 Hz (aliased
harmonic).
Result:
Aliasing occurs, and the observed signal will have frequencies at 5 Hz and 1 Hz.
Example 5:
An image with a 256x256 checkerboard pattern is sampled at a resolution of 128x128 pixels. The checkerboard
pattern consists of alternating black and white squares, with each square spanning 16 pixels in the original
image. Determine if aliasing will occur in the down sampled image and describe the resulting pattern.
Solution:
1. Original Frequency:
o The original frequency of the checkerboard pattern is 1\16 cycles per pixel.
o The new resolution is 128x128 pixels. Since each original square covered 16 pixels, after down
sampling, each square will now cover 8 pixels.
3. Nyquist Criterion:
o The Nyquist frequency after down sampling is 1\(2×8)=1\16 cycles per pixel, which matches the
original pattern's frequency.
4. Aliasing Check:
o Since the sampling rate is at the Nyquist limit, there is a risk of aliasing. Depending on the exact
phase of the checkerboard pattern relative to the sampling grid, the squares might appear
distorted or could blend together, causing artifacts.
Result:
Aliasing may occur, potentially causing visual distortions such as blurred or jagged edges in the checkerboard
pattern.
Example 6:
An audio signal consists of two tones, one at 300 Hz and the other at 1200 Hz. The signal is sampled at 1000
Hz. Determine the frequencies of the aliased tones.
Solution:
1. Nyquist Frequency:
o The Nyquist frequency for the given sampling rate is 1000\2=500 Hz.
o Since 300 Hz is less than the Nyquist frequency, it does not alias. The observed frequency is 300
Hz.
o 1200 Hz is greater than the Nyquist frequency and will alias. The aliased frequency:
fa=∣1200−n×1000∣=∣1200−1×1000∣=200 Hz.
Result:
The signal will contain frequencies at 300 Hz (unaltered) and 200 Hz (aliased from 1200 Hz).
Example 7:
A 512x512 pixel image with a diagonal line running from the top-left to the bottom-right is rotated by 45
degrees and then resampled to a 256x256 resolution. Determine the potential aliasing effects and how the line
will appear in the resampled image.
Solution:
1. Image Rotation:
o Rotating the image by 45 degrees changes the orientation of the diagonal line. This alters the
spatial frequency of the line relative to the image axes.
2. Resampling:
o Down sampling the image to 256x256 reduces the effective sampling rate by half.
3. Aliasing Analysis:
o The original diagonal line had a spatial frequency of 1\512 cycles per pixel along each axis.
After rotation, the line's frequency components may no longer align with the pixel grid,
increasing the effective spatial frequency.
o Since the image is resampled to a lower resolution, the higher effective frequency of the rotated
line might exceed the Nyquist limit, leading to aliasing. This could cause the line to appear
jagged, or even introduce false patterns along the line.
4. Resulting Appearance:
o The diagonal line in the resampled image may appear jagged or broken, with potential aliasing
artifacts depending on the rotation and the downsampling process.
Example 8:
A video camera captures frames at 24 frames per second (fps) of a rotating fan blade with a frequency of 10
revolutions per second (rps). Will aliasing occur? If so, what will be the apparent speed and direction of the
blade in the video?
Solution:
3. Nyquist Criterion:
o To avoid aliasing, the frame capture rate should be at least twice the rotation speed (20 fps).
Here, the capture rate is slightly above this but close enough to cause aliasing effects.
o Aliasing occurs because the rotation speed is close to the Nyquist limit. The apparent speed fa
can be calculated by: fa=∣10−n×24 |.
5. Apparent Motion:
o The blade will appear to rotate at 14 rps in the opposite direction to the actual rotation.
Result:
Aliasing occurs, causing the fan blade to appear to rotate at 14 rps in the reverse direction.