0% found this document useful (0 votes)
19 views11 pages

CV Notes

Image rectification is a process that transforms images to correct distortions and adhere to a specific geometric configuration like being rectilinear or projective. It is used to simplify tasks like object recognition, stereo matching, and measurements by correcting distortions from factors such as camera perspective, lens distortion, and scene geometry. The rectification process applies transformations to original images to bring them into a desired geometric state based on camera parameters or feature matching. It is commonly used in computer vision and stereo vision applications.

Uploaded by

N.RAMAKUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

CV Notes

Image rectification is a process that transforms images to correct distortions and adhere to a specific geometric configuration like being rectilinear or projective. It is used to simplify tasks like object recognition, stereo matching, and measurements by correcting distortions from factors such as camera perspective, lens distortion, and scene geometry. The rectification process applies transformations to original images to bring them into a desired geometric state based on camera parameters or feature matching. It is commonly used in computer vision and stereo vision applications.

Uploaded by

N.RAMAKUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Q1: Image Rectification

Image rectification in computer vision refers to the process of transforming images so


that they adhere to a specific geometric configuration, typically a rectilinear (straight
lines remain straight) or projective (collinear points remain collinear) transformation.
The goal of image rectification is to correct distortions caused by factors such as camera
perspective, lens distortion, and scene geometry, making it easier to perform subsequent
tasks such as object recognition, stereo matching, or measurements.
Image rectification is a transformation process used to project images onto a common
image plane. This process has several degrees of freedom and there are many strategies
for transforming images to the common plane. Image rectification is used in computer
stereo vision to simplify the problem of finding matching points between images (i.e.
the correspondence problem), and in geographic information systems to merge images
taken from multiple perspectives into a common map coordinate system.
The rectification process involves applying a transformation to the original image to
correct these distortions and bring the image into a desired geometric configuration. This
transformation can be based on known camera parameters, such as intrinsic and
extrinsic camera matrices, or through feature matching and homography estimation.
Image rectification warps both images such that they appear as if they have been taken
with only a horizontal displacement and as a consequence all epipolar lines are
horizontal, which slightly simplifies the stereo matching process. Note however, that
rectification does not fundamentally change the stereo matching process: It searches on
lines, slanted ones before and horizontal ones after rectification.

Image rectification is also an equivalent (and more often used) alternative to perfect
camera coplanarity. Even with high-precision equipment, image rectification is usually
performed because it may be impractical to maintain perfect coplanarity between
cameras.

Image rectification can only be performed with two images at a time and simultaneous
rectification of more than two images is generally impossible.

The need for image rectification arises in scenarios where the captured images do not
conform to the ideal conditions for computer vision algorithms. Common situations
include images captured from different viewpoints, varying camera angles, or scenes with
significant perspective distortion.
StereoRectification: http://www.sci.utah.edu/~gerig/CS6320-2013/Materials/CS6320-
CV-F2012-Rectification.pdf
In the context of stereo vision, image rectification is often performed to ensure that
corresponding points in the left and right images lie on the same scanline, simplifying the
process of finding disparities and reconstructing 3D information. This is crucial for
applications like stereo matching and depth perception.
It can also be defined as the process of resampling pairs of stereo images taken from
widely differing viewpoints in order to produce a pair of “matched epipolar projections”.
These are projections in which the epipolar lines run parallel with the x-axis and matchup
between views, and consequently disparities between the images are in the x-direction
only.
The aim of projective rectification is to remove the projective distortion in the
perspective image of a plane to the extent that similarity properties (angles, ratios of
lengths) could be measured on the original plane. The projective distortion can be
completely removed by specifying the position of four reference points on the plane (a
total of 8 degrees of freedom), and explicitly computing the transformation mapping the
reference points to their images.
In Affine Rectification, A projective transformation maps 𝑙∞ on a Euclidean plane 𝜋1 to a
finite line l on plane 𝜋1
Stereo rectification involves re-projecting image planes onto a common plane parallel to
the line between camera centres. For this we need two homographies (3x3 transform),
one for each input image re-projection
Steps in stereo rectification
1. Rotate the right camera by R (aligns camera coordinate system orientation only)
2. Rotate (rectify) the left camera so that the epipole is at infinity
3. Rotate (rectify) the right camera so that the epipole is at infinity
4. Adjust the scale
-----------------------------------------------------------------------------------------------------------------
Q2: Components of a Vision System
A vision system in computer vision typically consists of various components that work
together to capture, process, and interpret visual information. Here are the key
components of a vision system:
Image Acquisition:
 Cameras: The primary input device for capturing visual information. Cameras
capture images or video frames from the real world.
Pre-processing:

 Image Rectification: Corrects distortions in images caused by factors such as


camera perspective or lens distortion.
 Image Enhancement: Improves the quality of images by adjusting contrast,
brightness, and other parameters.
 Noise Reduction: Removes unwanted noise from images to improve the accuracy
of subsequent processing.
Feature Extraction:
 Detecting Key Points: Identifying distinctive points or regions in an image that can
be used for further analysis.
 Feature Descriptors: Extracting relevant information about the detected features
to enable matching and recognition.
Object Recognition:
 Object Detection: Identifying and locating objects in an image or video stream.
 Object Classification: Assigning a label or category to recognized objects.
Segmentation:
 Image Segmentation: Dividing an image into meaningful regions or segments
based on colour, texture, or other criteria.
Stereo Vision:

 Depth Perception: Using multiple cameras or images to infer the depth or three-
dimensional structure of a scene.
Motion Analysis:
 Optical Flow: Analysing the apparent motion of objects in a sequence of images.
 Object Tracking: Following the movement of specific objects across frames.
Scene Understanding:
 Semantic Segmentation: Assigning semantic labels to different regions in an
image, such as identifying road, sky, or objects.
 Contextual Understanding: Interpreting the overall context of a scene.
Decision Making:
 Classification and Recognition Results: Utilizing the information extracted from
the vision system to make decisions or trigger actions.
 Integration with Other Systems: Incorporating vision system outputs into larger
systems or processes.
User Interface:
 Visualization: Presenting the results of the vision system in a human-readable
format.
 Interaction: Allowing users to interact with the vision system, provide feedback,
or adjust parameters.
Feedback and Learning:
 Feedback Mechanisms: Iteratively improving the vision system based on user
feedback or performance evaluation.
 Machine Learning: Utilizing learning algorithms to adapt and improve the
system's performance over time.
These components collectively enable a vision system to interpret and understand visual
information, making it a crucial technology in applications such as autonomous vehicles,
robotics, medical imaging, and more.
-----------------------------------------------------------------------------------------------------------------
Q3: Weak Perspective Projective and Orthographic Projection in Affine Transform
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect5.pdf
Weak perspective projection and orthographic projection are two different types of
projections used in computer vision and computer graphics. They are both considered
affine transformations, which means they preserve straight lines but not necessarily
parallel lines. Here's a brief comparison between weak perspective projection and
orthographic projection in affine models:
Definition:
 Weak Perspective Projection: Also known as scaled orthographic projection, weak
perspective projection is a type of perspective projection that assumes small
angles and distances. It is a simplified version of perspective projection that
neglects depth variations.
 Orthographic Projection: In orthographic projection, parallel lines remain parallel,
and there is no foreshortening. This projection is often used in engineering and
technical drawings where accurate representation of distances is crucial.
Equations:
 Weak Perspective Projection: In a weak perspective projection, the projection of a
3D point (X,Y,Z) to a 2D point (x,y) is given by: x=s⋅X/(1+αZ),y=s⋅Y/(1+αZ) where
s is a scaling factor and α is a constant.
 Orthographic Projection: In orthographic projection, the projection of a 3D point
(X,Y,Z) to a 2D point (x,y) is given by: x=X,y=Y
Depth Information:
 Weak Perspective Projection: Weak perspective projection incorporates a depth
term (Z) in the denominator, but it is typically small, assuming that the scene is
not too far from the camera.
 Orthographic Projection: Orthographic projection ignores depth information, and
all objects, regardless of their distance from the camera, are projected onto the
image plane with the same size.
Applications:
 Weak Perspective Projection: Commonly used in applications where the depth
differences are not significant and a simplified perspective model is sufficient. It is
often used in computer graphics for rendering distant objects.
 Orthographic Projection: Commonly used in engineering and technical drawings
where accurate representation of distances and parallelism is essential. It is also
used in some computer vision applications, especially when depth information is
not critical.
When a scene’s relief is small relative to its average distance from the camera, the
magnification can be taken to be constant. This projection model is called weak
perspective, or scaled orthography. When it is a priori known that the camera will always
remain at a roughly constant distance from the scene, we can go further and normalize
the image coordinates so that m = −1. This is orthographic projection, defined by; x =X, y
= Y, with all light rays parallel to the k axis and orthogonal to the image plane π. Although
weak-perspective projection is an acceptable model for many imaging conditions,
assuming pure orthographic projection is usually unrealistic.
Q4: State any 4 limitations of thick lens
Thick lenses, especially in the context of optics, refer to lenses that have a significant
thickness compared to their focal length. While thick lenses are commonly used in
various optical systems, they come with certain limitations. Here are four limitations of
thick lenses:
Spherical Aberration:
Thick lenses, particularly those with significant curvature, can suffer from spherical
aberration. Spherical aberration occurs because rays passing through different parts of
the lens have different focal points, leading to blurred or distorted images.
Chromatic Aberration:
Chromatic aberration is another issue with thick lenses. It occurs because different
wavelengths of light are refracted by different amounts, causing colours to focus at
different points. This can result in colour fringing and reduced image quality.
Coma:
Coma is an optical aberration where off-axis points are imaged not as points but as comet-
shaped blurs. Thick lenses, especially if not properly designed or if used off-axis, can
exhibit coma. This can degrade the image quality, particularly in the peripheral areas of
the lens.
Distortion:
Thick lenses may introduce distortion, which is a deviation from the true shape or
proportion of objects. There are two main types of distortion: barrel distortion, where
straight lines appear to be curved outward, and pincushion distortion, where straight
lines appear to be curved inward. Distortion can be problematic in applications where
accurate representation of shapes is crucial, such as in photography or medical imaging.
-----------------------------------------------------------------------------------------------------------------
Q5: Design Cycle of pattern recognition system

Collect
Data

Evaluate Choose
Classifier Features

Train Choose
Classifier model
Data Collection and Pre-processing:
 Gather relevant data for training and testing the system. This may involve
acquiring labelled datasets for training the model and additional datasets for
evaluating its performance.
 Pre-process the data to ensure quality and consistency. This may include cleaning,
normalization, feature extraction, and handling missing or noisy data.
Feature Selection and Extraction:

 Identify relevant features that characterize the patterns in the data. Feature
selection involves choosing the most informative features, while feature
extraction transforms the data to a more suitable representation.
 Consider domain knowledge and statistical methods to guide feature selection and
extraction.
 Simple to extract, Invariant to irrelevant transformation, Insensitive to noise
Model Selection:
 Choose an appropriate pattern recognition model or algorithm based on the
nature of the problem and the characteristics of the data.
 Common approaches include machine learning algorithms such as support vector
machines, neural networks, k-nearest neighbours, or traditional statistical
methods.
 Domain dependence, availability of prior information, Definition of design criteria,
Parametric vs Non-Parametric, Handling of missing features, Computational
Complexity
Training the Model:
 Use the labelled training dataset to train the chosen model. The model learns to
recognize patterns by adjusting its parameters based on the provided examples.
 Optimize model parameters to achieve the best performance on the training data.
 Supervised, Unsupervised, Reinforcement
Testing and Evaluation:
 Assess the performance of the pattern recognition system on an independent
testing dataset. Evaluate metrics such as accuracy, precision, recall, F1 score, and
confusion matrix to quantify its performance.
 Iteratively refine the model or system based on the evaluation results.
 Problems of generalisation and overfitting
-----------------------------------------------------------------------------------------------------------------
Q6: Pose
It pertains to the orientation and position of an object or a camera in a three-dimensional
space. The pose of an object or a camera describes its spatial configuration, including its
position (location) and orientation (rotation) relative to a reference frame.
Pose can be defined as the position and orientation of an object present in an image in the
world coordinates.
Object Pose:
The pose of an object refers to its spatial position and orientation. It is often described
using a coordinate system defined by a reference point and reference axes. The position
is given by the X, Y, and Z coordinates, and the orientation is described in terms of
rotations around these axes (roll, pitch, and yaw). Object pose is crucial in applications
such as robotic manipulation, augmented reality, and computer-aided design.
Camera Pose:
The pose of a camera refers to its position and orientation in a three-dimensional space.
Similar to object pose, camera pose is defined by its position and orientation relative to a
reference frame. In computer vision, camera pose estimation is essential for tasks like
structure from motion, simultaneous localization and mapping (SLAM), and augmented
reality. Knowing the camera pose allows for the accurate mapping of 3D points in the
scene to 2D points in the image.
Pose Consistency:
Pose consistency means that different groups of features on a rigid object will all report
the same pose for the object.
-----------------------------------------------------------------------------------------------------------------
Q7: Distance Measures
Distance measures, also known as similarity or dissimilarity measures, quantify the
difference or similarity between two objects, patterns, or data points. Similarity measures
are those which measures how alike 2 objects are. These values are higher if the objects
are alike. Usually lie in [0, 1]. Dissimilarity is the numerical measure of how different are
two data objects. The value is lower when objects are alike
Properties of Similarity:
Maximum Similarity:
 S(p,q) = 1 if and only if p=q
Symmetry:
 s(p,q) = s(q,p) for all p,q

Properties of distance/dissimilarity:
Non-negativity: (Positive Definiteness)

 The distance or dissimilarity between two objects should always be non-negative.


That is, d(x,y)≥0 for any pair of objects x and y, and d(x,y)=0 if and only if x is
identical to y.
Symmetry:
 The order in which the objects are considered should not affect the distance. In
other words, d(x,y)=d(y,x) for any pair of objects x and y.
Triangle Inequality:

 For any three objects x, y, and z, the distance between x and z should be no greater
than the sum of the distances between x and y and y and z. Mathematically, this is
expressed as d(x,z)≤d(x,y)+d(y,z).
Examples of Dissimilarity Measures:

1. Euclidean Distance: √(∑(𝑥𝑖 − 𝑦𝑖 )2 )

2. Minkowski Distance:

3. Weighted Mean Variance (WMV):


Others include KL divergence, Earth mover Distance etc
Examples of Similarity Measures:

1. Cosine Similarity :

2. Jaccard Similarity:

3. Pearson’s Correlation Coefficient:


-----------------------------------------------------------------------------------------------------------------
Q8: K-means Clustering Algorithm
Initialization:
Randomly initialize 𝐾 cluster centroids: 𝐶 = {𝑐1 , 𝑐2 , 𝑐3 … . 𝑐𝑘 }
Assignment Step:
Assign each data point 𝑥𝑖 to the cluster with the nearest centroid.
2
argmin‖𝑥𝑖 − 𝑐𝑗 ‖
𝑐𝑗 ∈𝐶

Each data point is assigned to the cluster whose centroid has the minimum Euclidean
distance.
Update Step:
Recalculate the centroids of the clusters based on the assigned data points.
1
𝐶𝑗 = ∑ 𝑥𝑖
|𝑆𝑗 |
𝑥𝑖 ∈ 𝑆𝑗

Where 𝑆𝑗 is the set of data points assigned to cluster j.

Convergence Check:
Check for convergence by comparing the new centroids to the previous centroids. If the
centroids do not change significantly or a predefined number of iterations is reached, the
algorithm stops.
Iteration:
Repeat steps 2-4 until convergence.
Objective Function:
The objective function being minimized is the sum of squared distances between data
points and their assigned cluster centroids:
𝑛
2
𝐽 = ∑‖𝑥𝑖 − 𝑐𝑗𝑖 ‖
𝑖=1

where 𝑗𝑖 is the index of the cluster to which data point 𝑥𝑖 is assigned.


Final Output: The final output of the K-means algorithm is a set of 𝐾 cluster centroids
and the assignment of each data point to a cluster.
-----------------------------------------------------------------------------------------------------------------
Q9: Neural Network Structures for Pattern Recognition

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy