CV Notes
CV Notes
Image rectification is also an equivalent (and more often used) alternative to perfect
camera coplanarity. Even with high-precision equipment, image rectification is usually
performed because it may be impractical to maintain perfect coplanarity between
cameras.
Image rectification can only be performed with two images at a time and simultaneous
rectification of more than two images is generally impossible.
The need for image rectification arises in scenarios where the captured images do not
conform to the ideal conditions for computer vision algorithms. Common situations
include images captured from different viewpoints, varying camera angles, or scenes with
significant perspective distortion.
StereoRectification: http://www.sci.utah.edu/~gerig/CS6320-2013/Materials/CS6320-
CV-F2012-Rectification.pdf
In the context of stereo vision, image rectification is often performed to ensure that
corresponding points in the left and right images lie on the same scanline, simplifying the
process of finding disparities and reconstructing 3D information. This is crucial for
applications like stereo matching and depth perception.
It can also be defined as the process of resampling pairs of stereo images taken from
widely differing viewpoints in order to produce a pair of “matched epipolar projections”.
These are projections in which the epipolar lines run parallel with the x-axis and matchup
between views, and consequently disparities between the images are in the x-direction
only.
The aim of projective rectification is to remove the projective distortion in the
perspective image of a plane to the extent that similarity properties (angles, ratios of
lengths) could be measured on the original plane. The projective distortion can be
completely removed by specifying the position of four reference points on the plane (a
total of 8 degrees of freedom), and explicitly computing the transformation mapping the
reference points to their images.
In Affine Rectification, A projective transformation maps 𝑙∞ on a Euclidean plane 𝜋1 to a
finite line l on plane 𝜋1
Stereo rectification involves re-projecting image planes onto a common plane parallel to
the line between camera centres. For this we need two homographies (3x3 transform),
one for each input image re-projection
Steps in stereo rectification
1. Rotate the right camera by R (aligns camera coordinate system orientation only)
2. Rotate (rectify) the left camera so that the epipole is at infinity
3. Rotate (rectify) the right camera so that the epipole is at infinity
4. Adjust the scale
-----------------------------------------------------------------------------------------------------------------
Q2: Components of a Vision System
A vision system in computer vision typically consists of various components that work
together to capture, process, and interpret visual information. Here are the key
components of a vision system:
Image Acquisition:
Cameras: The primary input device for capturing visual information. Cameras
capture images or video frames from the real world.
Pre-processing:
Depth Perception: Using multiple cameras or images to infer the depth or three-
dimensional structure of a scene.
Motion Analysis:
Optical Flow: Analysing the apparent motion of objects in a sequence of images.
Object Tracking: Following the movement of specific objects across frames.
Scene Understanding:
Semantic Segmentation: Assigning semantic labels to different regions in an
image, such as identifying road, sky, or objects.
Contextual Understanding: Interpreting the overall context of a scene.
Decision Making:
Classification and Recognition Results: Utilizing the information extracted from
the vision system to make decisions or trigger actions.
Integration with Other Systems: Incorporating vision system outputs into larger
systems or processes.
User Interface:
Visualization: Presenting the results of the vision system in a human-readable
format.
Interaction: Allowing users to interact with the vision system, provide feedback,
or adjust parameters.
Feedback and Learning:
Feedback Mechanisms: Iteratively improving the vision system based on user
feedback or performance evaluation.
Machine Learning: Utilizing learning algorithms to adapt and improve the
system's performance over time.
These components collectively enable a vision system to interpret and understand visual
information, making it a crucial technology in applications such as autonomous vehicles,
robotics, medical imaging, and more.
-----------------------------------------------------------------------------------------------------------------
Q3: Weak Perspective Projective and Orthographic Projection in Affine Transform
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect5.pdf
Weak perspective projection and orthographic projection are two different types of
projections used in computer vision and computer graphics. They are both considered
affine transformations, which means they preserve straight lines but not necessarily
parallel lines. Here's a brief comparison between weak perspective projection and
orthographic projection in affine models:
Definition:
Weak Perspective Projection: Also known as scaled orthographic projection, weak
perspective projection is a type of perspective projection that assumes small
angles and distances. It is a simplified version of perspective projection that
neglects depth variations.
Orthographic Projection: In orthographic projection, parallel lines remain parallel,
and there is no foreshortening. This projection is often used in engineering and
technical drawings where accurate representation of distances is crucial.
Equations:
Weak Perspective Projection: In a weak perspective projection, the projection of a
3D point (X,Y,Z) to a 2D point (x,y) is given by: x=s⋅X/(1+αZ),y=s⋅Y/(1+αZ) where
s is a scaling factor and α is a constant.
Orthographic Projection: In orthographic projection, the projection of a 3D point
(X,Y,Z) to a 2D point (x,y) is given by: x=X,y=Y
Depth Information:
Weak Perspective Projection: Weak perspective projection incorporates a depth
term (Z) in the denominator, but it is typically small, assuming that the scene is
not too far from the camera.
Orthographic Projection: Orthographic projection ignores depth information, and
all objects, regardless of their distance from the camera, are projected onto the
image plane with the same size.
Applications:
Weak Perspective Projection: Commonly used in applications where the depth
differences are not significant and a simplified perspective model is sufficient. It is
often used in computer graphics for rendering distant objects.
Orthographic Projection: Commonly used in engineering and technical drawings
where accurate representation of distances and parallelism is essential. It is also
used in some computer vision applications, especially when depth information is
not critical.
When a scene’s relief is small relative to its average distance from the camera, the
magnification can be taken to be constant. This projection model is called weak
perspective, or scaled orthography. When it is a priori known that the camera will always
remain at a roughly constant distance from the scene, we can go further and normalize
the image coordinates so that m = −1. This is orthographic projection, defined by; x =X, y
= Y, with all light rays parallel to the k axis and orthogonal to the image plane π. Although
weak-perspective projection is an acceptable model for many imaging conditions,
assuming pure orthographic projection is usually unrealistic.
Q4: State any 4 limitations of thick lens
Thick lenses, especially in the context of optics, refer to lenses that have a significant
thickness compared to their focal length. While thick lenses are commonly used in
various optical systems, they come with certain limitations. Here are four limitations of
thick lenses:
Spherical Aberration:
Thick lenses, particularly those with significant curvature, can suffer from spherical
aberration. Spherical aberration occurs because rays passing through different parts of
the lens have different focal points, leading to blurred or distorted images.
Chromatic Aberration:
Chromatic aberration is another issue with thick lenses. It occurs because different
wavelengths of light are refracted by different amounts, causing colours to focus at
different points. This can result in colour fringing and reduced image quality.
Coma:
Coma is an optical aberration where off-axis points are imaged not as points but as comet-
shaped blurs. Thick lenses, especially if not properly designed or if used off-axis, can
exhibit coma. This can degrade the image quality, particularly in the peripheral areas of
the lens.
Distortion:
Thick lenses may introduce distortion, which is a deviation from the true shape or
proportion of objects. There are two main types of distortion: barrel distortion, where
straight lines appear to be curved outward, and pincushion distortion, where straight
lines appear to be curved inward. Distortion can be problematic in applications where
accurate representation of shapes is crucial, such as in photography or medical imaging.
-----------------------------------------------------------------------------------------------------------------
Q5: Design Cycle of pattern recognition system
Collect
Data
Evaluate Choose
Classifier Features
Train Choose
Classifier model
Data Collection and Pre-processing:
Gather relevant data for training and testing the system. This may involve
acquiring labelled datasets for training the model and additional datasets for
evaluating its performance.
Pre-process the data to ensure quality and consistency. This may include cleaning,
normalization, feature extraction, and handling missing or noisy data.
Feature Selection and Extraction:
Identify relevant features that characterize the patterns in the data. Feature
selection involves choosing the most informative features, while feature
extraction transforms the data to a more suitable representation.
Consider domain knowledge and statistical methods to guide feature selection and
extraction.
Simple to extract, Invariant to irrelevant transformation, Insensitive to noise
Model Selection:
Choose an appropriate pattern recognition model or algorithm based on the
nature of the problem and the characteristics of the data.
Common approaches include machine learning algorithms such as support vector
machines, neural networks, k-nearest neighbours, or traditional statistical
methods.
Domain dependence, availability of prior information, Definition of design criteria,
Parametric vs Non-Parametric, Handling of missing features, Computational
Complexity
Training the Model:
Use the labelled training dataset to train the chosen model. The model learns to
recognize patterns by adjusting its parameters based on the provided examples.
Optimize model parameters to achieve the best performance on the training data.
Supervised, Unsupervised, Reinforcement
Testing and Evaluation:
Assess the performance of the pattern recognition system on an independent
testing dataset. Evaluate metrics such as accuracy, precision, recall, F1 score, and
confusion matrix to quantify its performance.
Iteratively refine the model or system based on the evaluation results.
Problems of generalisation and overfitting
-----------------------------------------------------------------------------------------------------------------
Q6: Pose
It pertains to the orientation and position of an object or a camera in a three-dimensional
space. The pose of an object or a camera describes its spatial configuration, including its
position (location) and orientation (rotation) relative to a reference frame.
Pose can be defined as the position and orientation of an object present in an image in the
world coordinates.
Object Pose:
The pose of an object refers to its spatial position and orientation. It is often described
using a coordinate system defined by a reference point and reference axes. The position
is given by the X, Y, and Z coordinates, and the orientation is described in terms of
rotations around these axes (roll, pitch, and yaw). Object pose is crucial in applications
such as robotic manipulation, augmented reality, and computer-aided design.
Camera Pose:
The pose of a camera refers to its position and orientation in a three-dimensional space.
Similar to object pose, camera pose is defined by its position and orientation relative to a
reference frame. In computer vision, camera pose estimation is essential for tasks like
structure from motion, simultaneous localization and mapping (SLAM), and augmented
reality. Knowing the camera pose allows for the accurate mapping of 3D points in the
scene to 2D points in the image.
Pose Consistency:
Pose consistency means that different groups of features on a rigid object will all report
the same pose for the object.
-----------------------------------------------------------------------------------------------------------------
Q7: Distance Measures
Distance measures, also known as similarity or dissimilarity measures, quantify the
difference or similarity between two objects, patterns, or data points. Similarity measures
are those which measures how alike 2 objects are. These values are higher if the objects
are alike. Usually lie in [0, 1]. Dissimilarity is the numerical measure of how different are
two data objects. The value is lower when objects are alike
Properties of Similarity:
Maximum Similarity:
S(p,q) = 1 if and only if p=q
Symmetry:
s(p,q) = s(q,p) for all p,q
Properties of distance/dissimilarity:
Non-negativity: (Positive Definiteness)
For any three objects x, y, and z, the distance between x and z should be no greater
than the sum of the distances between x and y and y and z. Mathematically, this is
expressed as d(x,z)≤d(x,y)+d(y,z).
Examples of Dissimilarity Measures:
2. Minkowski Distance:
1. Cosine Similarity :
2. Jaccard Similarity:
Each data point is assigned to the cluster whose centroid has the minimum Euclidean
distance.
Update Step:
Recalculate the centroids of the clusters based on the assigned data points.
1
𝐶𝑗 = ∑ 𝑥𝑖
|𝑆𝑗 |
𝑥𝑖 ∈ 𝑆𝑗
Convergence Check:
Check for convergence by comparing the new centroids to the previous centroids. If the
centroids do not change significantly or a predefined number of iterations is reached, the
algorithm stops.
Iteration:
Repeat steps 2-4 until convergence.
Objective Function:
The objective function being minimized is the sum of squared distances between data
points and their assigned cluster centroids:
𝑛
2
𝐽 = ∑‖𝑥𝑖 − 𝑐𝑗𝑖 ‖
𝑖=1