2nd Aiml

22BQ1A6138
An Internship Report
On
GOOGLE AI-ML VIRTUAL

INTERNSHIP
Submitted for partial fulfilment of the requirements for the award of degree of
Bachelor of
Technology In
Artificial Intelligence and Machine
Learning by
RACHAKONDA PAVANA SRI – 22BQ1A6138
Department Of Artificial Intelligence and Machine Learning

VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY
Approved by AICTE, Permanently Affiliated to JNTU,
KAKINADA Accredited by NBA & Accredited by NAAC with 'A'
Grade NAMBUR(V), PEDAKAKANI(M), GUNTUR(Dt) -522508
22BQ1A6138
Department Of Artificial Intelligence and Machine Learning

Approved by AICTE, Permanently Affiliated to JNTU,
KAKINADA Accredited by NBA & Accredited by NAAC with 'A'
Grade NAMBUR(V), PEDAKAKANI(M), GUNTUR(Dt) -522508
DECLARATION
I, RACHAKONDA PAVANA SRI here by declare that the course entitled

GOOGLE AI- ML VIRTUAL INTERNSHIP done by me at Vasireddy Venkatadri
Institute of Technology is submitted for partial fulfillment of the requirements for the
award of Credits in Department of AIML. The results embodied in this have not been
submitted to any other University for the same purpose.
Date: R.Pavana Sri – 22BQ1A6138
Place: Guntur Signature of the Candidate

22BQ1A6138
Department of AIML
CERTIFICATE
This certificate attests that the following report accurately represents the work
completed by RACHAKONDA PAVANA SRI, Registration Number 22BQ1A6138, during
the academic year 2023-2024, covering the time period from December 2023 to February
2024, as part of the GOOGLE AI-ML VIRTUAL INTERNSHIP PROGRAMME.
Signature of the Internship Coordinator Signature of the HOD
Mr. K. Balakrishna Dr. K. Suresh Babu
(Asst. Prof., Department Of CSM ) (Prof., Department Of CSM )

22BQ1A6138
ABSTRACT
In the ever-evolving landscape of technology, Google stands as a pioneering force,
consistently pushing boundaries in the realms of Artificial Intelligence (AI) and Machine
Learning (ML). This abstract encapsulates the immersive experience of participating in
Google's AI-ML virtual internship program, offering a glimpse into the dynamic world of
cutting- edge innovation and collaborative learning.
The internship journey begins with an introduction to Google's extensive suite of AI and ML
tools, providing interns with a comprehensive understanding of foundational concepts and
practical applications. Through a series of interactive modules, participants delve into diverse
topics ranging from neural networks and deep learning to natural language processing and
computer vision.
Central to the internship experience is hands-on project work, where interns have the
opportunity to apply their newfound knowledge to real-world challenges. Guided by
experienced mentors, interns engage in problem-solving exercises, experimentation, and
iterative development, fostering a culture of creativity and innovation.
Collaboration lies at the heart of the internship program, as interns work alongside peers from
diverse backgrounds and disciplines. Through virtual meetings, group discussions, and peer
reviews, participants exchange ideas, offer feedback, and collectively tackle complex
problems, enriching the learning experience and fostering a sense of community.
Furthermore, the internship program offers valuable insights into Google's culture of
innovation, emphasizing the importance of curiosity, continuous learning, and a growth
mindset. Interns are encouraged to explore new ideas, challenge assumptions, and embrace
failure as a stepping stone towards success.
Overall, the Google AI-ML virtual internship provides a unique opportunity for aspiring
technologists to immerse themselves in the world of AI and ML, gaining hands-on experience,
valuable skills, and insights into industry best practices. By fostering collaboration, creativity,
and a passion for innovation, the program equips interns with the tools and knowledge to drive
positive change and shape the future of technology.
22BQ1A6138
LETTER OF UNDERTAKING
To
The Principal
Vasireddy Venkatadri Institute of Technology
Namburu,
Guntur.
Subject: Submission of Internship Report on Google AI-ML Virtual

Internship on Eduskills platform.
Dear Sir,
I am pleased to submit my internship report on “Google AI-ML Virtual

Internship” as per your instruction to fulfil the requirements of the Degree of
Bachelor of Technology in AIML from Jawaharlal Nehru Technological
University, Kakinada. While preparing this report, I have tried my level best to
include all the relevant information, explanations, things I learned from the
Internship Courses, my contribution to this programme to make the report
informative and comprehensive. It would not have been possible to complete this
report without your assistance, of which I am very thankful. Working for two months
on Google AI-ML Virtual Internship in online was amazing and a huge learning
opportunity for me. Also, it was a great experience to prepare this report and I will
be available for any clarification, if required.
Therefore, I pray and hope that you would be kind enough to accept my
Internship Report and oblige thereby.
Yours Obediently,
R.Pavana Sri.
ID:22BQ1A6138
EMAIL: 22BQ1A6138@vvit.net
22BQ1A6138
CERTIFICATE OF INTERNSHIP
22BQ1A6138
ACKNOWLEDGEMENT
We take this opportunity to express our deepest gratitude and appreciation to all
those people who made this Internship work easier with words of encouragement,
motivation, discipline, and faith by offering different places to look to expand my ideas and
help me towards the successful completion of this Internship work.
First and foremost, we express our deep gratitude to Mr. Vasireddy VidyaSagar,
Chairman, Vasireddy Venkatadri Institute of Technology for providing necessary facilities
throughout the Computer Science & Engineering program.
We express our sincere thanks to Dr. Y. Mallikarjuna Reddy, Principal,

Vasireddy Venkatadri Institute of Technology for his constant support and cooperation
throughout the Computer Science & Engineering program.
We express our sincere gratitude to Dr. K. Suresh Babu, Professor & HOD,
Information Technology, Vasireddy Venkatadri Institute of Technology for his constant
encouragement, motivation and faith by offering different places to look to expand my
ideas.
We would like to express our sincere gratitude to our VVIT INTERNSHIP I/C
Mr. Y V Subba Reddy, SPOC and our Internship Coordinator Mr. K. Balakrishna for his
insightful advice, motivating suggestions, invaluable guidance, help and support in
successful completion of this Internship.
We would like to take this opportunity to express our thanks to the teaching and
non- teaching staff in the Department of Computer Science & Engineering, VVIT for their
invaluable help and support.
R.Pavana Sri-22BQ1A6138
22BQ1A6138
Table of Contents:
Google AI-ML Virtual Internship :
Module Module Contents Date Page

No
Module 1 Program neural networks with Tensor Flow
1. The Hello world of machine learning
2. Introduction to computer vision
3. Introduction to Convolutions
4. Convolutional Neural Networks(CNNs)
5. Complex Images
6. Use CNNs with larger datasets
Module 2 Get started with object detection

1. Introduction to object detection
2. Build an object detector into your mobile app
3. Integrate an object detector using ML Kit
Object Detection API
Module 3 Go further with object detection

1. Train your own object-detection model
2. Build and deploy a custom object detection
model with TensorFlow Lite
Module 4 Get started with product image search

1. Introduction to product image search on mobile
2. Build an object detector into your mobile app
3. Detect objects in images to build a visual
product search:Android
4. Object detection: static images
5. Object detection: live camera
Module 5 Go further with product image search

1. Call the product search backend from the
mobile app
2. Call the product search backend from the
Android app
3. Build a visual product search backend using
Vision API Product Search
Module 6 Go further with image classification

1. Build a flower recognizer
2. Create a custom model for your image classifier
3. Integrate a custom model into your app
22BQ1A6138
Edu Skills with VVIT:

22BQ1A6138
UNIT-1: Program neural networks with Tensor Flow
Data Preparation: Collect and preprocess the dataset to ensure it's suitable for training. Model
Definition: Use TensorFlow's Keras API to construct the neural network architecture. Model
Compilation: Compile the model, specifying the optimizer and loss function.
Model Training: Train the compiled model using the prepared dataset.
Model Evaluation: Assess the model's performance on a separate validation or test dataset.
Model Deployment: Deploy the trained model for inference on new data, potentially using
TensorFlow Serving or TensorFlow Lite for mobile applications.
Module 1: The Hello World of Machine Learning

Machine learning (ML) is a branch of artificial intelligence that involves training algorithms to learn
from data and make predictions or decisions. Here, we'll walk through a simple project to introduce
the basic concepts and workflow of creating a machine learning model.
Basic Concepts
Data Collection: Gathering relevant data for training the model.
Data Preprocessing: Cleaning and preparing data, including handling missing values and
normalizing features.
Model Selection: Choosing an appropriate algorithm for the task.
Training the Model: Feeding training data to the algorithm to learn patterns.
Evaluation: Assessing model performance using testing data.
Prediction: Using the trained model to make predictions on new data.
10 | P a g e
22BQ1A6138
Example Project: Predicting Housing Prices
Step 1: Data Collection
For this example, we use a dataset with information about houses, such as price, size,
number of bedrooms, and location.

python
import pandas as pd
# Load dataset
data = pd.read_csv('housing_data.csv')
print(data.head())
Step 2: Data Preprocessing

Clean the data by handling missing values and normalizing features
python
# Handle missing
values data =
data.dropna()
# Normalize features
data['normalized_size'] = (data['size'] - data['size'].mean()) / data['size'].std()
# Select features and target variable

X = data[['normalized_size', 'bedrooms', 'location']]
y = data['price']
Step 3: Model Selection

Select a linear regression model for this task.
python
from sklearn.model_selection import
train_test_split from sklearn.linear_model import
LinearRegression # Split data into training and
testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize model
model = LinearRegression()
11 | P a g e
22BQ1A6138
Step 4: Training the Model
Train the linear regression model using the training data.
python
# Train model
model.fit(X_train, y_train)
Step 5: Evaluation
Evaluate the model's performance using the testing data.
from sklearn.metrics import mean_squared_error

# Predict on test data
y_pred = model.predict(X_test)
# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
Step 6: Prediction
Use the trained model to predict prices on new data.
python
# New data
new_data = pd.DataFrame({'normalized_size': [0.5], 'bedrooms': [3], 'location': [2]})
# Predict price
predicted_price = model.predict(new_data)
print(f'Predicted Price: {predicted_price}')
Conclusion
This simple project introduces the essential steps in a machine learning workflow: collecting
data, preprocessing it, selecting and training a model, evaluating the model's performance,
and making predictions. As you progress, you'll encounter more complex algorithms, larger
datasets, and advanced techniques to enhance model performance.
12 | P a g e
22BQ1A6138
Module 2: Introduction to Computer Vision
Introduction to Computer Vision

Computer Vision is a field of artificial intelligence that enables computers to interpret and
make decisions based on visual data from the world. It combines techniques from computer
science, mathematics, and engineering to process and analyze images and videos.
What is Computer Vision?

Computer Vision involves the automatic extraction, analysis, and understanding of useful
information from a single image or a sequence of images. It includes methods for acquiring,
processing, analyzing, and understanding digital images to enable machines to perform tasks
that typically require human vision.
Key Applications
 Image Recognition: Identifying objects, people, places, and actions in images. Used in
social media tagging, content moderation, and photo organization.
 Object Detection: Locating objects within an image and drawing bounding boxes around
them. Essential for autonomous driving and surveillance systems.
 Image Segmentation: Partitioning an image into meaningful segments to simplify
analysis. Used in medical imaging and scene understanding.
 Face Recognition: Identifying or verifying a person from a digital image. Used in
security systems, smartphones, and social media.
 Optical Character Recognition (OCR): Converting different types of documents
into editable and searchable data. Used in digitizing printed texts.
 Augmented Reality (AR): Overlaying digital content on the real world. Used in gaming,
navigation, and interactive learning.
13 | P a g e
22BQ1A6138
Key Concepts and Techniques Image

Processing
Image processing involves manipulating pixel data to enhance or extract information.
Common techniques include:
-Filtering: Removing noise or enhancing features using convolutional filters.
 Edge Detection: Identifying the boundaries within images using algorithms like Canny
or Sobel.
 Thresholding: Converting grayscale images to binary images by setting a threshold value.
python
import
cv2
import numpy as np
from matplotlib import pyplot as

plt # Load image
image = cv2.imread('example.jpg', 0)
# Apply Canny edge detection
edges = cv2.Canny(image, 100, 200)
# Display edges
plt.imshow(edges, cmap='gray')
plt.title('EdgeImage')
plt.show()
Feature Extraction
Features are distinctive elements in an image. Techniques include:
 SIFT (Scale-Invariant Feature Transform): Detecting and describing local features.
 HOG (Histogram of Oriented Gradients): Describing the structure or the shape of an

object.
Machine Learning in Computer Vision

Machine learning techniques, especially deep learning, have revolutionized computer
vision. Convolutional Neural Networks (CNNs) are particularly powerful for image-related
tasks.
14 | P a g e
22BQ1A6138
Convolutional Neural Networks (CNNs)

CNNs are designed to automatically and adaptively learn spatial hierarchies of features from
input images. They consist of layers like convolutional layers, pooling layers, and fully
connected layers.
Python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
# Create a simple CNN model

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
Flatten(),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
Compile the model

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Training and Evaluation

Training involves feeding labeled images to the model, allowing it to learn the features and
patterns. Evaluation is done using metrics like accuracy, precision, recall, and F1 score on a
separate test dataset.
Transfer Learning
Using pre-trained models like VGG16, ResNet, or Inception, and fine-tuning them for
specific tasks can save time and resources.
15 | P a g e
22BQ1A6138
python
from tensorflow.keras.applications import VGG16
# Load pre-trained VGG16 model + higher level layers
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Add custom layers on top of VGG16

base_model,
Flatten(),
Dense(1, activation='sigmoid') # Assuming binary classification
])
# Freeze the layers of VGG16

for layer in
base_model.layers:
layer.trainable = False
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Conclusion
Computer Vision is a rapidly evolving field with a wide range of applications

impacting various industries. Understanding the basic concepts and techniques is crucial for
developing solutions that enable machines to interpret and act on visual information.
16 | P a g e
22BQ1A6138
Module 3: Introduction To Convolutions

Convolutions are a fundamental operation in many fields, particularly in image processing
and deep learning. They help in extracting features from data by applying a filter (or kernel)
across the input data to produce a feature map.
What is a Convolution?
A convolution is a mathematical operation that combines two functions to produce a third
function. It merges two sets of information: an input (e.g., an image) and a filter (or kernel),
producing an output that highlights specific features of the input.
Convolution in Image Processing
In the context of image processing, a convolution involves sliding a filter over the input
image, performing element-wise multiplication and summing the results to produce a single
output value.
How Convolutions Work
 Input Image: A grid of pixel values, usually represented in a 2D array for grayscale images
or 3D array for color images.
 Filter (Kernel): A smaller grid of numbers used to detect features like edges, textures, or
patterns.
 Stride: The number of pixels the filter moves across the image. Strides can be adjusted to
control the overlap of the filter applications.
 Padding: Adding extra pixels around the border of the image to control the output size.
Common padding methods include "valid" (no padding) and "same" (padding to keep the
output size the same as the input size).
17 | P a g e
22BQ1A6138
Mathematical Representation
If \( I \) is the input image and \( K \) is the kernel, the convolution operation \( I * K \) at a
specific location is calculated as:
\[ (I * K)(x, y) = \sum_{i=-k}^{k} \sum_{j=-k}^{k} I(x+i, y+j) \cdot K(i, j) \]
where \( (x, y) \) is the position in the input image, and \( (i, j) \) are positions in the kernel,
assuming a square kernel of size \( (2k+1) \times (2k+1) \).
### Example:
Edge Detection Consider a simple 3x3 edge detection filter:
\[ K = \begin{bmatrix} -1 & -1 & -1 \\ 0 & 0 & 0 \\ 1 & 1 & 1 \end{bmatrix}

\] When this filter is convolved with an image, it emphasizes horizontal
edges.
python
import
cv2
import numpy as np
from matplotlib import pyplot as plt
# Load grayscale image
image = cv2.imread('example.jpg', cv2.IMREAD_GRAYSCALE)
# Define edge detection filter

kernel = np.array([[-1, -1, -1],
[0, 0, 0],
[1, 1, 1]])
# Apply convolution
output = cv2.filter2D(image, -1, kernel)
# Display the result

plt.imshow(output, cmap='gray')
plt.title('EdgeDetection')
plt.show()
18 | P a g e
22BQ1A6138
Convolutions in Deep Learning
Convolutions are the cornerstone of Convolutional Neural Networks (CNNs), which are
widely used for image recognition and processing.
Convolutional Layers
In a CNN, convolutional layers apply multiple filters to the input image, each producing a
separate feature map. These feature maps are then combined to create a deeper understanding
of the image content.
Key Components
 *Filters/Kernels*: Learnable parameters that are optimized during the training process to
detect specific features.
 *Activation Functions*: Non-linear functions (like ReLU) applied after convolution
to introduce non-linearity.
 *Pooling Layers*: Reduce the spatial dimensions of feature maps, retaining essential
information and reducing computation.
Example of a Simple CNN
Python
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
# Create a simple CNN model

MaxPooling2D((2,2))
, Flatten(),
Dense(64,activation='relu'),
Dense(10, activation='softmax')
])
# Compile the model

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
19 | P a g e
22BQ1A6138
Benefits of Convolutions in Deep Learning

 Parameter Sharing: Reduces the number of parameters, making the model more efficient.
 Translation Invariance: Detects features regardless of their position in the image.
 *Hierarchical Feature Learning*: Captures low-level features in initial layers and high-
level features in deeper layers.
Conclusion
Convolutions are a powerful tool for feature extraction in both image processing and deep
learning. Understanding how they work and how to implement them is crucial for developing
effective computer vision models.
20 | P a g e
22BQ1A6138
Module 4: Convolutional Neural Networks(CNNs)

Advanced Persistent Threats (APTs):
Convolutional Neural Networks (CNNs) are a class of deep neural networks particularly well-
suited for tasks involving images, such as image classification, object detection, and image
segmentation. They have revolutionized the field of computer vision by automatically
learning and extracting hierarchical features from images, reducing the need for manual
feature engineering.
Architecture of CNNs Convolutional

Layers
The core building blocks of CNNs are convolutional layers. Each convolutional layer consists
of multiple filters (also known as kernels) that slide across the input image, performing
convolution operations. Convolution involves element-wise multiplication of the filter weights
with the input pixels at each position and summing the results to produce feature maps.
These feature maps capture various patterns and features present in the input images,
such as edges, textures, and shapes.
Convolutional layers are characterized by parameters such as the number of filters, filter size
(width and height), stride, and padding. The number of filters determines the depth of the
output volume, controlling the number of features extracted. Stride refers to the number of
pixels the filter moves across the input image, influencing the spatial dimensions of the output
feature maps. Padding involves adding extra pixels around the input image to control the
output size and preserve spatial information.
Pooling Layers
Pooling layers are used to reduce the spatial dimensions of the feature maps while retaining
the most important information. The two most common types of pooling operations are max
pooling and average pooling. In max pooling, the maximum value within each pool (typically
a small window) is retained, while in average pooling, the average value is computed.
Pooling helps in controlling overfitting by reducing the number of parameters and
computational complexity of the network. It also provides translation invariance, making the
network less sensitive to small variations in the position of features within the input image.
Fully Connected Layers
After several convolutional and pooling layers, the extracted features are flattened and fed
21 | P a g e
22BQ1A6138
into one or more fully connected layers. These layers perform classification or regression
tasks based on the learned features. Fully connected layers connect every neuron in one layer
to every neuron in the next layer, enabling complex non-linear mappings between features
and output classes.
Activation functions like ReLU (Rectified Linear Unit) are typically applied after each layer
to introduce non-linearity in the network. ReLU replaces all negative values in the feature
maps with zero, allowing the network to learn complex decision boundaries and improve
training convergence.
Training CNNs
Convolutional Filters
During the training process, CNNs learn the parameters of convolutional filters through
backpropagation and gradient descent. These filters start as random weights and are optimized
to detect specific features present in the input images during training. Features learned in the
early layers are simple, like edges and textures, while deeper layers learn more complex
features like object parts and configurations.
The learning process involves minimizing a loss function, which measures the difference
between the predicted outputs and the ground truth labels. Optimization algorithms like
Stochastic Gradient Descent (SGD), Adam, or RMSprop are used to update the filter weights
iteratively, moving them towards values that minimize the loss.
22 | P a g e
22BQ1A6138
Data Augmentation
To prevent overfitting and improve generalization, data augmentation techniques are often
applied during training. These techniques involve randomly applying transformations such as
rotation, scaling, translation, and flipping to the input images, increasing the diversity of the
training data.
Data augmentation helps the model learn to generalize better to unseen examples and reduces
the risk of overfitting by exposing the network to a wider range of variations present in the
real-world data.
Transfer Learning
Transfer learning is a technique where pre-trained CNN models, trained on large datasets like
ImageNet, are fine-tuned for specific tasks. By leveraging the knowledge learned from these
large datasets, transfer learning can significantly reduce training time and the amount of
labeled data required for training.
In transfer learning, the pre-trained CNN model serves as a feature extractor, with its
convolutional layers frozen to preserve the learned representations. Additional layers, such
as fully connected layers, are added on top of the pre-trained model and trained on the target
task-specific dataset.
Applications of CNNs Image

Classification
One of the primary applications of CNNs is image classification, where the goal is to classify
images into predefined categories or labels. CNNs can automatically learn to distinguish between
different objects, animals, and scenes present in images with high accuracy. Applications
include identifying objects in photographs, classifying diseases from medical images, and
recognizing handwritten digits in postal codes.
Object Detection
CNNs can also perform object detection tasks, where the goal is to localize and classify
multiple objects within an image. Object detection involves drawing bounding boxes around
objects of interest and assigning class labels to each bounding box. Applications include
autonomous driving, surveillance systems, and counting objects in retail settings.
23 | P a g e
22BQ1A6138
Image Segmentation
Image segmentation involves partitioning an image into meaningful segments or regions.
CNNs can perform pixel-wise classification, assigning each pixel to a specific class or
category. This allows for more detailed analysis of images and precise localization of
objects. Applications include medical image analysis, autonomous robots, and scene
understanding.
Image Generation
CNNs can also generate new images based on learned patterns and features. Generative
models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs)
learn to generate realistic images by capturing the underlying distribution of the training data.
Applications include generating art, creating realistic avatars, and enhancing image quality.
Benefits of CNNs
Parameter Sharing
CNNs use parameter sharing to reduce the number of parameters and computational
complexity of the network. By sharing weights across spatial locations within the same
feature map, CNNs can learn translational invariance, making them robust to shifts and
translations in the input data.
Hierarchical Feature Learning

CNNs learn hierarchical representations of features, starting from simple low-level features
like edges and textures in the early layers to more complex high-level features like object
parts and configurations in the deeper layers. This hierarchical feature learning enables
CNNs to capture intricate patterns and relationships within images and achieve state-of-the-
art performance on various computer vision tasks.
Translation Invariance
CNNs exhibit translation invariance, meaning they can recognize features and objects
regardless of their position or orientation within the input image. This property makes CNNs
robust to small variations and distortions in the input data, enhancing their ability to
generalize to unseen examples and real-world scenarios.
24 | P a g e
22BQ1A6138
Conclusion
Convolutional Neural Networks have revolutionized the field of computer vision, enabling
machines to automatically learn and extract features from images. Their hierarchical
architecture, combined with data-driven learning, makes them powerful tools for a wide range
of image-related tasks, from image classification and object detection to image segmentation
and generation. As CNNs continue to evolve, they are expected to play an increasingly
significant role in various applications, driving innovation and advancements in computer
vision and artificial intelligence..
25 | P a g e
22BQ1A6138
Module 5: Complex Images
Understanding Complex Images

Complex images are rich visual representations that contain diverse content, intricate
structures, and detailed information. These images can be found in various contexts,
including natural scenes, satellite imagery, medical scans, microscopic views, and digital art.
Understanding complex images involves analyzing their components, extracting meaningful
features, and interpreting the content they convey.
Characteristics of Complex Images Diverse

Content
Complex images often contain a wide range of objects, textures, colors, and patterns. They
may depict scenes with multiple interacting elements, such as landscapes with trees,
buildings, people, and animals, or intricate biological structures like cells and tissues.
Intricate Structures
Complex images can exhibit complex spatial arrangements and hierarchical structures. They
may contain overlapping objects, occlusions, shadows, reflections, and intricate geometries.
Understanding the relationships between different components and structures is essential for
comprehending the overall scene.
Detailed Information
Complex images may contain fine-grained details and subtle variations that convey
important information. These details can be critical for tasks such as object recognition,
classification, segmentation, and analysis. Extracting and processing these details require
sophisticated algorithms and techniques.
Challenges in Analyzing Complex Images

Scale
Complex images can vary significantly in scale, from microscopic views of cells and
molecules to macroscopic views of landscapes and cityscapes. Analyzing images at different
scales requires methods for multi-scale processing and feature extraction.
26 | P a g e
22BQ1A6138
Noise and Artifacts

Complex images may contain noise, distortions, and artifacts introduced during acquisition or
processing. Removing or mitigating these unwanted elements is essential for accurate
analysis and interpretation of the image content.
Occlusions and Ambiguities

Objects in complex images may be partially occluded or obscured by other elements, leading
to ambiguities in interpretation. Resolving occlusions and disambiguating overlapping
objects require advanced techniques for scene understanding and object reconstruction.
Techniques for Analyzing Complex Images
Deep Learning
Deep learning techniques, particularly Convolutional Neural Networks (CNNs), have shown
remarkable success in analyzing complex images. CNNs can automatically learn hierarchical
representations of features from raw pixel data, enabling tasks such as image classification,
object detection, and image segmentation.
Feature Extraction
Feature extraction methods are used to identify and extract relevant information from
complex images. These methods may involve handcrafted features, such as texture
descriptors, edge detectors, and keypoints, or learned features extracted from pre-trained
CNN models.
Image Fusion
Image fusion techniques combine information from multiple images or modalities to create a
single, enhanced representation. Fusion methods may include techniques for combining
visible and infrared imagery, multi-sensor fusion, and fusion of images acquired at different
resolutions.
27 | P a g e
22BQ1A6138
Spatial Analysis
Spatial analysis techniques involve analyzing the spatial distribution and relationships of
objects within complex images. These techniques may include spatial statistics, object- based
image analysis, and spatial modeling for understanding patterns and structures within the
image.
Applications of Complex Images

Remote Sensing
In remote sensing, complex images captured by satellites and aerial platforms are used for
various applications, including environmental monitoring, land use classification, disaster
management, and urban planning.
Biomedical Imaging
In biomedical imaging, complex images obtained from medical scans, such as MRI, CT,
and microscopy, are used for diagnosis, treatment planning, and research in fields such
as radiology, oncology, neurology, and pathology.
Art and Design

In art and design, complex images are created and analyzed for aesthetic purposes, creative
expression, and visual communication. Digital art, graphic design, and multimedia
productions often involve complex images with intricate compositions and visual effects.
Conclusion
Complex images are rich visual representations that contain diverse content, intricate
structures, and detailed information. Analyzing and understanding these images
requiresophisticated techniques from computer vision, image processing, and machine
learning. By leveraging advanced algorithms and methodologies, researchers and
practitioners can extract valuable insights and knowledge from complex images, driving
advancements in various fields, from science and medicine to art and design.
28 | P a g e
22BQ1A6138
Module 6: Use CNNs With Larger DataSets
Leveraging Convolutional Neural Networks (CNNs) with Large Datasets

Convolutional Neural Networks (CNNs) have emerged as a powerful tool for various computer vision
tasks, ranging from image classification to object detection and image segmentation. With the
availability of larger datasets and advances in hardware and software infrastructure, CNNs have been
increasingly employed to tackle complex problems and achieve state-of-the-art performance. In this
exploration, we delve into the utilization of CNNs with larger datasets, examining the benefits,
challenges, techniques, and applications associated with this approach.
Benefits of Using CNNs with Larger Datasets Enhanced

Generalization
Training CNNs with larger datasets allows the models to learn more diverse and representative
features from the data. This exposure to a wide range of examples helps the models generalize
better to unseen data and real-world scenarios, resulting in improved performance and robustness.
Increased Model Capacity

Larger datasets provide more training examples, enabling the training of deeper and more complex
CNN architectures. With increased model capacity, CNNs can capture intricate patterns and
relationships within the data, leading to higher accuracy and better performance on challenging tasks.
Improved Regularization
Larger datasets offer more opportunities for regularization techniques such as dropout, batch
normalization, and data augmentation. These techniques help prevent overfitting by introducing
noise, perturbations, and variations during training, leading to more stable and generalizable models.
29 | P a g e
22BQ1A6138
Challenges in Utilizing Larger Datasets with CNNs

Computational Resources
Training CNNs with larger datasets requires significant computational resources, including
high- performance GPUs or specialized hardware accelerators. Managing the computational
infrastructure and scaling training processes can be challenging, especially for organizations with
limited resources.
Data Management
Handling and preprocessing large datasets, including storage, retrieval, and preprocessing, can
be complex and resource-intensive. Efficient data pipelines, storage solutions, and preprocessing
techniques are essential for managing and utilizing large datasets effectively.
Labeling and Annotation

Labeling and annotating large datasets with ground truth labels can be labor-intensive and
time- consuming. Manual annotation efforts may require human annotators or crowdsourcing
platforms, introducing potential errors and inconsistencies in the labeled data.
Techniques for Training CNNs with Larger Datasets Distributed

Training
Distributed training techniques parallelize the training process across multiple GPUs or distributed
computing clusters, allowing for faster training and scalability with larger datasets. Frameworks like
TensorFlow and PyTorch provide built-in support for distributed training, enabling seamless
integration with existing CNN architectures.
Transfer Learning
Transfer learning leverages pre-trained CNN models trained on large datasets, such as ImageNet,
and fine-tunes them on target tasks with specific datasets. By transferring knowledge learned from the
large dataset to the target task, transfer learning accelerates training, reduces data requirements,
and improves generalization.
Semi-Supervised Learning
Semi-supervised learning combines labeled and unlabeled data to train CNN models, leveraging
the abundance of unlabeled data available in larger datasets. Techniques such as self -training,
consistency regularization, and pseudo-labeling enable CNNs to learn from both labeled and
30 | P a g e
22BQ1A6138
unlabeled examples, improving performance and robustness.
Active Learning
Active learning strategies intelligently select informative examples from the larger dataset for
annotation, reducing the labeling effort while maximizing the performance of the CNN model.
Techniques such as uncertainty sampling, query by committee, and Bayesian optimization guide the
selection of data points for annotation, focusing on regions of the feature space where the model
is uncertain or likely to improve.
Applications of CNNs with Larger Datasets
Image Classification
CNNs trained with larger datasets excel at image classification tasks, accurately categorizing images
into predefined classes or labels. Applications include identifying objects, animals, and scenes in
photographs, medical image analysis, and quality control in manufacturing processes.
Object Detection
CNNs with larger datasets enable precise object detection and localization within images. By
leveraging diverse examples, CNNs can detect and classify multiple objects simultaneously,
facilitating applications such as autonomous driving, surveillance systems, and inventory management.
Image Segmentation
CNNs trained on large datasets perform pixel-wise segmentation of images, partitioning them
into semantically meaningful regions or objects. This enables applications such as medical
image segmentation, autonomous robots, and environmental monitoring.
Generative Modeling
CNNs can be used for generative modeling tasks, generating new images with realistic features learned
from the large dataset. Generative Adversarial Networks (GANs) and Variational Autoencoders
(VAEs) trained on large datasets produce high-quality images for applications such as art generation,
image synthesis, and data augmentation.
Future Directions
As CNNs continue to advance and datasets grow in size and complexity, several directions hold promise
for further exploration and innovation:
 *Weakly Supervised Learning*: Techniques for training CNNs with weak supervision, such as
image- level labels or partial annotations, can enable the utilization of even larger datasets with
31 | P a g e
22BQ1A6138
minimal labeling effort.
 *Self-Supervised Learning*: Self-supervised learning methods, which learn representations from data
without explicit supervision, offer opportunities for leveraging the abundance of unlabeled data in
larger datasets.
 *Domain Adaptation*: Techniques for domain adaptation and transfer learning across different
datasets and domains can facilitate the transfer of knowledge learned from one dataset to another,
even when they exhibit domain shifts or variations.
Conclusion
Leveraging Convolutional Neural Networks with larger datasets offers numerous benefits,
Including enhanced generalization, increased model capacity, and improved regularization. While
challenges such as computational resources, data management, and labeling persist, techniques
such as distributed training.
32 | P a g e
22BQ1A6138
UNIT 2:Started With Object Detection
Understand object detection principles and its applications in image processing.

Choose a deep learning framework like TensorFlow or PyTorch for object detection tasks.
Prepare a labeled dataset with diverse images containing objects of interest.
Select an appropriate object detection model architecture, such as SSD or Faster R-
CNN. Train the chosen model on your dataset, adjusting parameters for optimal
performance. Evaluate the trained model's accuracy and deploy it for real-
world applications.
Module 1: Introduction to Object Detection

Object detection is a computer vision task that involves identifying and locating objects within
an image or video frame. Unlike image classification, which categorizes entire images into
predefined classes, object detection provides more detailed information by detecting multiple
objects and drawing bounding boxes around them. In this exploration, we delve into the
principles, techniques, applications, and challenges of object detection.
Principles of Object Detection

Localization
Object detection begins with localizing objects within an image, which involves determining
their spatial extent or position. This is typically represented by bounding boxes, which
enclose the objects' boundaries and indicate their locations.
Classification
Once objects are localized, they are classified into predefined categories or classes.
Classification assigns a label to each object, indicating what it is or what category it belongs
to. Common object categories include people, animals, vehicles, and everyday objects.
Detection
Object detection combines localization and classification to detect and identify multiple objects
within an image simultaneously. It involves predicting bounding boxes and class labels for all
objects present in the image, enabling comprehensive scene understanding.
33 | P a g e
22BQ1A6138
Techniques for Object Detection Traditional

Methods
Traditional object detection methods relied on handcrafted features, such as Histogram of
Oriented Gradients (HOG), Haar-like features, and Scale-Invariant Feature Transform (SIFT).
These methods often used sliding window approaches combined with machine learning
classifiers, such as Support Vector Machines (SVMs) or AdaBoost, to detect objects at
different scales and positions within an image.

Convolutional Neural Networks (CNNs) have revolutionized object detection by
automatically learning hierarchical features from raw pixel data. CNN-based object detection
approaches typically involve two stages: region proposal and object classification. Region
proposal methods, such as Selective Search or Region Proposal Networks (RPNs), generate
candidate object bounding boxes, which are then classified and refined using CNN-based
classifiers.
Single Shot Detectors (SSDs)

Single Shot Detectors (SSDs) are one-stage object detection models that predict object bounding
boxes and class probabilities directly from feature maps. SSDs use a single CNN to perform both
region proposal and classification, making them fast and efficient. They achieve real-time
performance by predicting bounding box offsets and class scores at multiple scales and aspect
ratios within a single network.
34 | P a g e
22BQ1A6138
Faster R-CNN and its Variants

Faster R-CNN is a two-stage object detection framework that combines region proposal and
object classification into a single unified model. It uses a Region Proposal Network (RPN) to
generate candidate object proposals, which are then refined and classified by a CNN-based
classifier. Faster R-CNN achieves state-of-the-art performance by introducing region-based
convolutional networks and region-wise ROI (Region of Interest) pooling.
Applications of Object Detection

Autonomous Vehicles
Object detection plays a crucial role in autonomous vehicles, enabling them to perceive and
understand their surroundings. Object detection systems detect and classify vehicles,
pedestrians, cyclists, traffic signs, and other objects to inform decision-making processes such
as navigation, collision avoidance, and lane keeping.
Surveillance and Security

In surveillance and security systems, object detection is used to monitor and analyze video
feeds for suspicious activities, intrusions, or threats. Object detection algorithms can detect
and track people, vehicles, and objects of interest in real-time, alerting security personnel to
potential security breaches or anomalies.
Retail and Inventory Management

Object detection is employed in retail environments for inventory management, product
recognition, and customer behavior analysis. Retailers use object detection systems to track
products on shelves, monitor inventory levels, and analyze customer interactions to optimize
store layouts and marketing strategies.
Medical Imaging
In medical imaging, object detection aids in the diagnosis and treatment of diseases by detecting
and localizing anatomical structures, abnormalities, and pathologies. Object detection
algorithms analyze medical scans such as X-rays, MRIs, and CT scans to identify tumors,
lesions, fractures, and other medical conditions, assisting radiologists and clinicians in patient
care.
35 | P a g e
22BQ1A6138
Challenges in Object Detection

Scale Variation
Objects in images can vary significantly in scale, pose, orientation, and aspect ratio. Detecting
objects at different scales and accurately localizing them under scale variations is challenging
and requires robust feature representations and multi-scale analysis techniques.
Occlusions and Clutter

Objects in real-world scenes may be partially occluded by other objects or obscured by
cluttered backgrounds. Object detection algorithms must handle occlusions and clutter to
accurately detect and localize objects, distinguishing between foreground objects and
background noise.
Real-Time Performance
In applications requiring real-time processing, such as autonomous driving and surveillance
systems, object detection algorithms must operate with low latency and high throughput.
Achieving real-time performance while maintaining accuracy and reliability is a significant
challenge that requires efficient algorithms and hardware acceleration.
Data Annotation and Labeling

Annotating training data with accurate bounding box annotations and class labels is labor-
intensive and time-consuming. Manual annotation efforts may require expert annotators or
crowdsourcing platforms, introducing potential errors and inconsistencies in the labeled data.
Conclusion
Object detection is a fundamental computer vision task that involves identifying and locating
objects within images or video frames. With the advent of deep learning and convolutional
neural networks, object detection has seen significant advancements, enabling real-time
performance and state-of-the-art accuracy on various applications ranging from autonomous
vehicles to medical imaging. Despite challenges such as scale variation, occlusions, and real-
time processing requirements, object detection continues to drive innovation and impact
diverse domains, shaping the future of computer vision and artificial intelligence.
36 | P a g e
22BQ1A6138
Module 2: Build an Object Detector into an Mobile App
Adding an object detection capability to your application can enhance its functionality by
enabling it to identify and locate objects within images or video streams. In this guide, we'll
explore the steps involved in integrating an object detector into your application, covering
key concepts, implementation considerations, and practical examples.
Understanding Object Detection
Object Detection Techniques

Object detection involves identifying and localizing objects within images or video frames.
Traditional object detection methods relied on handcrafted features and machine learning
classifiers, while modern approaches leverage deep learning and convolutional neural
networks (CNNs) to automatically learn hierarchical representations from raw pixel data.
Key Components of Object Detection

 *Localization*: Determining the spatial extent or position of objects within the
image, typically represented by bounding boxes.
 *Classification*: Assigning class labels to the detected objects, indicating what
they are or what category they belong to.
 *Detection*: Combining localization and classification to detect and identify
multiple objects within the image simultaneously, enabling comprehensive scene
understanding.
Choosing an Object Detection Model
Pre-Trained Models
Several pre-trained object detection models are available, trained on large datasets such as
COCO (Common Objects in Context) or ImageNet. These models offer a wide range of
architectures and performance levels, making them suitable for various applications and
deployment scenarios.
Model Selection Criteria

When choosing an object detection model for your application, consider factors such as
accuracy, speed, model size, and compatibility with your deployment platform. Evaluate the
trade-offs between model complexity and performance to select the most suitable model for
37 | P a g e
22BQ1A6138
your requirements.
Implementing Object Detection in Your App

Integration Options
You can integrate object detection into your application using different approaches,
depending on your requirements and constraints:
 *Using Pre-Trained Models*: Load pre-trained object detection models into your
application and use them to perform inference on input images or video frames.
 *Custom Training*: Train your own object detection model using labeled training data
specific to your application domain. This approach provides flexibility and customization but
requires sufficient labeled data and computational resources.
Frameworks and Libraries

Several deep learning frameworks and libraries offer object detection functionalities, making
it easier to integrate object detection into your application. Popular options include
TensorFlow, PyTorch, and OpenCV, which provide pre-trained models, inference APIs, and
utilities for model deployment.
Model Inference
Performing inference with an object detection model involves feeding input images or video
frames into the model and obtaining predictions for detected objects and their bounding
boxes. Most frameworks provide APIs or functions for loading pre-trained models and
performing inference efficiently.
Deployment Considerations
Hardware Requirements
Consider the hardware requirements for running the object detection model in your
application. Deep learning models, especially larger ones, may require GPUs or specialized
hardware accelerators for optimal performance and speed.
Performance Optimization
Optimize your application's performance by employing techniques such as model
quantization, pruning, and compression to reduce the model size and computational overhead.
Additionally, consider using hardware acceleration and parallelization to speed up inference
on resource-constrained devices.
38 | P a g e
22BQ1A6138
Scalability and Maintenance
Design your application with scalability and maintainability in mind, especially if you plan to
deploy it to multiple devices or platforms. Use containerization and cloud services to
streamline deployment and management processes, and ensure compatibility with future
updates and improvements.
Practical Example: Integrating Object Detection into a Mobile App
Step 1: Choose a Pre-Trained Model

Select a pre-trained object detection model compatible with mobile deployment, such as
MobileNet SSD or YOLO (You Only Look Once), optimized for speed and efficiency
on mobile devices.
Step 2: Integrate the Model into Your App

Use a deep learning framework such as TensorFlow Lite or PyTorch Mobile to load the
pre- trained model into your mobile app and perform inference on input images or video
frames.
Step 3: Process Inference Results

Process the inference results to extract detected objects and their bounding boxes, along with
their corresponding class labels and confidence scores. Visualize the detected objects by
drawing bounding boxes and labels on the input images or video frames.
39 | P a g e
22BQ1A6138
40 | P a g e
22BQ1A6138
Step 4: Deploy and Test Your App
Deploy your mobile app with integrated object detection capabilities to app stores or test
devices. Evaluate its performance, accuracy, and user experience under different scenarios
and usage conditions.
Conclusion :
Integrating an object detector into your application can significantly enhance its functionality
and utility, enabling it to automatically identify and locate objects within images or video
streams. By understanding the key concepts, choosing appropriate models, implementing
efficient inference pipelines, and considering deployment considerations, you can
successfully integrate object detection into your app and unlock a wide range of possibilities
for various domains and applications.
41 | P a g e
22BQ1A6138
Module 3: Integrate an object detector using ML Kit Object Detection API
Adding an object detection capability to your application can enhance its functionality by
enabling it to identify and locate objects within images or video streams. In this guide, we'll
explore the steps involved in integrating an object detector into your application, covering
key concepts, implementation considerations, and practical examples.
Object Detection Techniques
Object detection involves identifying and localizing objects within images or video frames.
Traditional object detection methods relied on handcrafted features and machine learning
classifiers, while modern approaches leverage deep learning and convolutional neural
networks (CNNs) to automatically learn hierarchical representations from raw pixel data.
Key Components of Object Detection

 Localization: Determining the spatial extent or position of objects within the
image, typically represented by bounding boxes.
 Classification: Assigning class labels to the detected objects, indicating what
they are or what category they belong to.
 Detection: Combining localization and classification to detect and identify
multiple objects within the image simultaneously, enabling comprehensive scene
understanding.
Choosing an Object Detection Model
Pre-Trained Models
Several pre-trained object detection models are available, trained on large datasets such as
COCO (Common Objects in Context) or ImageNet. These models offer a wide range of
architectures and performance levels, making them suitable for various applications and
deployment scenarios.
42 | P a g e
22BQ1A6138
Model Selection Criteria
When choosing an object detection model for your application, consider factors such as
accuracy, speed, model size, and compatibility with your deployment platform. Evaluate the
trade-offs between model complexity and performance to select the most suitable model for
your requirements.
Implementing Object Detection in Your App

Integration Options
You can integrate object detection into your application using different approaches,
depending on your requirements and constraints:
 *Using Pre-Trained Models*: Load pre-trained object detection models into your
application and use them to perform inference on input images or video frames.
 *Custom Training*: Train your own object detection model using labeled training data
specific to your application domain. This approach provides flexibility and customization but
requires sufficient labeled data and computational resources.
Frameworks and Libraries
Several deep learning frameworks and libraries offer object detection functionalities, making
it easier to integrate object detection into your application. Popular options include
TensorFlow, PyTorch, and OpenCV, which provide pre-trained models, inference APIs, and
utilities for model deployment.
43 | P a g e
22BQ1A6138
Model Inference
Performing inference with an object detection model involves feeding input images or video
frames into the model and obtaining predictions for detected objects and their bounding
boxes. Most frameworks provide APIs or functions for loading pre-trained models and
performing inference efficiently.
Deployment Considerations
Hardware Requirements
Consider the hardware requirements for running the object detection model in your
application. Deep learning models, especially larger ones, may require GPUs or specialized
hardware accelerators for optimal performance and speed.
Performance Optimization
Optimize your application's performance by employing techniques such as model
quantization, pruning, and compression to reduce the model size and computational overhead.
Additionally, consider using hardware acceleration and parallelization to speed up inference
on resource-constrained devices.
Scalability and Maintenance

Design your application with scalability and maintainability in mind, especially if you plan to
deploy it to multiple devices or platforms. Use containerization and cloud services to
streamline deployment and management processes, and ensure compatibility with future
updates and improvements.
Practical Example: Integrating Object Detection into a Mobile App
Step 1: Choose a Pre-Trained Model

Select a pre-trained object detection model compatible with mobile deployment, such as
MobileNet SSD or YOLO (You Only Look Once), optimized for speed and efficiency
on mobile devices.
44 | P a g e
22BQ1A6138
Step 2: Integrate the Model into Your App

Use a deep learning framework such as TensorFlow Lite or PyTorch Mobile to load the pre-
trained model into your mobile app and perform inference on input images or video frames.
Step 3: Process Inference Results

Process the inference results to extract detected objects and their bounding boxes, along with
their corresponding class labels and confidence scores. Visualize the detected objects by
drawing bounding boxes and labels on the input images or video frames.
Step 4: Deploy and Test Your App

Deploy your mobile app with integrated object detection capabilities to app stores or test
devices. Evaluate its performance, accuracy, and user experience under different scenarios
and usage conditions.
Conclusion
Integrating an object detector into your application can significantly enhance its functionality
and utility, enabling it to automatically identify and locate objects within images or video
streams. By understanding the key concepts, choosing appropriate models, implementing
efficient inference pipelines, and considering deployment considerations, you can
successfully integrate object detection into your app and unlock a wide range of possibilities
for various domains and applications.
45 | P a g e
22BQ1A6138
UNIT:3 Go Further With object Detection
To deepen your expertise in object detection, delve into advanced architectures such as EfficientDet
and YOLOv4, which offer improved speed and accuracy. Experiment with sophisticated data
augmentation methods to enhance model robustness and generalization capabilities. Explore
transfer learning by fine-tuning pre- trained models on specialized datasets or custom
domains, leveraging the knowledge learned from large-scale datasets. Additionally, investigate
emerging techniques like one-shot learning and meta-learning for object detection tasks with
limited labeled data. Stay abreast of the latest research and methodologies through academic
literature and community forums to continually refine and advance your object detection skills.
Module 1: Train Your own Object – detection Model

Training your own object detection model allows you to create a customized solution tailored
to specific requirements and datasets. This guide will walk you through the steps involved in
training an object detection model, covering data preparation, model selection, training, and
evaluation. We'll use TensorFlow and its Object Detection API as an example framework.

Object detection involves identifying and locating objects within images. Unlike simple
classification tasks, object detection provides both class labels and bounding box
coordinates for each detected object. Modern object detection models use deep learning
techniques, particularly convolutional neural networks (CNNs), to achieve high accuracy.
Steps to Train Your Own Object Detection Model

 Data Preparation
 Collecting Data
Gather a dataset of images containing the objects you want to detect. Ensure that your
dataset is diverse and representative of the scenarios in which your model will be used.
 Annotating Data
Annotate your images by drawing bounding boxes around the objects of interest and
assigning class labels.Tools like LabelImg or RectLabel can help with the annotation process.
Save the annotations in a compatible format (e.g., Pascal VOC XML or COCO JSON
46 | P a g e
22BQ1A6138
 Organizing Data
Organize your dataset into training and validation sets. A common split is 80% for training
and 20% for validation. Structure your directories as follows:
dataset/
├ ── train/
│ ├ ── images/
│ └ ── annotations/
└ ── val/
├ ── images/
└ ── annotations/
 Environment Setup
 Install TensorFlow and Dependencies
Install TensorFlow and other necessary libraries. TensorFlow 2.x is recommended.
bash
pip install tensorflow tensorflow-gpu
 Install TensorFlow Object Detection API

Clone the TensorFlow Models repository and install the Object Detection API.
bash
git clone https://github.com/tensorflow/models.git

cd models/research
pip install .
cd object_detection
pip install .
 Model Selection
Choose a pre-trained object detection model from the TensorFlow Model Zoo. Pre-trained models
provide a good starting point and can be fine-tuned on your dataset. Popular choices include SSD
(Single Shot MultiBox Detector), Faster R-CNN, and EfficientDet
47 | P a g e
22BQ1A6138
 Configuration
 Create a TFRecord File
Convert your annotated data into TFRecord format, which is the standard input format for
TensorFlow Object Detection API.
Use the create_pascal_tf_record.py or create_coco_tf_record.py script provided by
TensorFlow Object Detection API to generate TFRecord files for your training and validation
datasets.
 Edit Configuration File

Modify the configuration file of the chosen pre-trained model to specify paths to your dataset,
TFRecord files, label map, and other parameters such as batch size and number of training
steps. Configuration files can be found in the samples/configs directory of the TensorFlow
Models repository.
python
train_input_reader: {
label_map_path:"path/to/label_map.pbtxt"
tf_record_input_reader {
input_path: "path/to/train.record"
eval_input_reader: {
input_path: "path/to/val.record"
 Training
 Start Training
Run the model training script, specifying the path to your configuration file.
bash
48 | P a g e
22BQ1A6138
Python model_main_tf2.py --model_dir=training/ --
pipeline_config_path=path/to/pipeline.config
Monitor the training process using TensorBoard to visualize metrics such as loss,
accuracy, and mAP (mean Average Precision).
 Evaluation
Evaluate your model on the validation set to check its performance. TensorFlow Object
Detection API provides tools to compute metrics like precision, recall, and mAP.
 Export the Model
Once training is complete and you are satisfied with the model's performance, export the
trained model for deployment.
Bash
python exporter_main_v2.py --input_type image_tensor --pipeline_config_path

path/to/pipeline.config --trained_checkpoint_dir training/ --output_directory output/
 Deployment
Deploy the trained model to your application. TensorFlow provides various options for
deployment, including TensorFlow Serving, TensorFlow Lite for mobile and embedded
devices, and TensorFlow.js for web applications.
Conclusion
Training your own object detection model involves several steps, including data preparation,
model selection, configuration, training, evaluation, and deployment. By following these
steps, you can create a customized object detection solution tailored to your specific needs
and datasets. Leveraging TensorFlow and its Object Detection API simplifies the process,
providing tools and pre-trained models to accelerate development and achieve high
performance.
Example Code Snippet
Here’s a basic example of how to set up and train an object detection model using
TensorFlow Object Detection API:
python
import tensorflow as tf
from object_detection.utils import config_util
from object_detection.builders import model_builder
49 | P a g e
22BQ1A6138
# Load pipeline config and build a detection model
pipeline_config = 'path/to/pipeline.config'
configs=config_util.get_configs_from_pipeline_file(pipeline_config
) model_config = configs['model']
detection_model = model_builder.build(model_config=model_config, is_training=True)
# Load checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ckpt.restore('path/to/checkpoint').expect_partial()
# Train the model
def train_step(image_tensors, groundtruth_boxes_list, groundtruth_classes_list):

with tf.GradientTape() as tape:
# Run the model and calculate the loss
preprocessed_images = tf.concat([image for image in image_tensors], axis=0)

groundtruth_boxes=tf.concat(groundtruth_boxes_list,axis=0)
groundtruth_classes = tf.concat(groundtruth_classes_list, axis=0)

loss_dict=detection_model(preprocessed_images,groundtruth_boxes,groundtruth_classe)
total_loss = loss_dict['Loss/total_loss']
gradients=tape.gradien(total_loss,detection_model.trainable_variables)
optimizer.apply_gradients(zip(gradients, detection_model.trainable_variables))
return total_loss
# Training loop
for epoch in range(num_epochs):
for image_tensors, groundtruth_boxes_list, groundtruth_classes_list in train_dataset:

loss = train_step(image_tensors, groundtruth_boxes_list, groundtruth_classes_list)
print('Epoch {}: Loss {}'.format(epoch, loss))
50 | P a g e
22BQ1A6138
Module 2: Build and deploy a custom object-detection model with Tensor flow Lite
Creating a custom object detection model using TensorFlow Lite allows you to deploy
efficient machine learning models on mobile and edge devices. TensorFlow Lite is designed for
speed and small footprint, making it ideal for applications where resources are limited. This
guide will cover the steps to build, train, convert, and deploy a custom object detection model using
TensorFlow Lite.
Steps to Build a Custom Object Detection Model
 Data Preparation
 Collecting Data
Gather a dataset of images that contain the objects you want to detect. Ensure the dataset is
diverse and representative of real-world scenarios.
 Annotating Data
Annotate the images by drawing bounding boxes around the objects and assigning class
labels. Tools like LabelImg or RectLabel can help with this task. Save the annotations in
a format compatible with TensorFlow, such as Pascal VOC XML or COCO JSON.
51 | P a g e
22BQ1A6138
 Organizing Data
Structure your dataset into training and validation sets. A typical split is 80% for training and
20% for validation. Organize the data into directories as follows:
dataset/
├ ── train/
└ ── val/
├ ── images/
 Model Selection
Choose a model architecture suitable for mobile deployment. EfficientDet, SSD MobileNet,
and YOLO (You Only Look Once) are popular choices due to their balance between speed
and accuracy.
 Environment Setup
Ensure you have TensorFlow 2.x installed. You can install TensorFlow using pip:
bash
pip install tensorflow

 Install TensorFlow Object Detection API
Clone the TensorFlow Models repository and install the Object Detection API:
bash

cd models/research
pip install .
cd object_detection
pip install .
52 | P a g e
22BQ1A6138
 Convert Data to TFRecord Format

TensorFlow Object Detection API uses TFRecord format for training. Convert your dataset
using provided scripts or custom scripts. Below is an example of converting Pascal VOC
XML annotations to TFRecord:
python
from object_detection.utils import dataset_util

import os
import xml.etree.ElementTree as ET
def create_tf_example(image_path, annotation_path):

with tf.io.gfile.GFile(image_path, 'rb') as fid:
encoded_image = fid.read()
image = tf.io.decode_jpeg(encoded_image)
height, width = image.shape[:2]
tree = ET.parse(annotation_path)
root = tree.getroot()
filename = root.find('filename').text.encode('utf8')
xmin = []
ymin = []
xmax = []
ymax = []
classes_text = []
classes = []
Formemberinroot.findall('object'): xmin.append(float(member[4]
[0].text)/width) ymin.append(float(member[4][1].text)/height)
xmax.append(float(member[4][2].text)/width)
ymax.append(float(member[4][3].text)/height)
classes_text.append(member[0].text.encode('utf8'))
classes.append(1) # Assuming a single class for simplicity
53 | P a g e
22BQ1A6138
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height':dataset_util.int64_feature(height),
'image/width':dataset_util.int64_feature(width),
'image/filename':dataset_util.bytes_feature(filename),
'image/source_id':dataset_util.bytes_feature(filename),
'image/encoded':dataset_util.bytes_feature(encoded_image),
'image/format':dataset_util.bytes_feature(b'jpeg'),
'image/object/bbox/xmin':dataset_util.float_list_feature(xmin),
'image/object/bbox/ymin':dataset_util.float_list_feature(ymin),
'image/object/bbox/xmax':dataset_util.float_list_feature(xmax),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text':dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def create_tfrecord(output_path, image_dir, annotations_dir):

writer = tf.io.TFRecordWriter(output_path)
for image_file in os.listdir(image_dir):
image_path = os.path.join(image_dir, image_file)
annotation_path = os.path.join(annotations_dir, os.path.splitext(image_file)[0] + '.xml')

tf_example=create_tf_example(image_path,annotation_path)
writer.write(tf_example.SerializeToString())
writer.close()
create_tfrecord('train.record','dataset/train/images','dataset/train/annotations')\\\\\\\
create_tfrecord('val.record', 'dataset/val/images', 'dataset/val/annotations')
 Training the Model

 Set Up the Training Pipeline
Modify the configuration file for the chosen model. Update paths for the training and
validation data, label map, and other parameters such as batch size and number of steps.
54 | P a g e
22BQ1A6138
python
 Start Training
Run the training script with the specified configuration file.
bash
Python model_main_tf2.py--model_dir=training/ --
pipeline_config_path=path/to/pipeline.config
Monitor training progress using TensorBoard.

###
 Converting the Model to TensorFlow Lite a.Export the Trained Model
After training is complete, export the model using the exporter script:
bash
python exporter_main_v2.py --input_type image_tensor --
pipeline_config_path path/to/pipeline.config --trained_checkpoint_dir training/ --
output_directory exported_model/
55 | P a g e
22BQ1A6138
 b. Convert the Model to TensorFlow Lite Format

Convert the TensorFlow model to TensorFlow Lite format using the tflite_convert command:
python
converter=tf.lite.TFLiteConverter.from_saved_model('exported_model/saved_model')
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:

f.write(tflite_model)
 Deploying the Model
 Integrate TensorFlow Lite Model into Your App
In your mobile application, load and run the TensorFlow Lite model using the TensorFlow
Lite interpreter. Below is an example for Android using Java:
java
import org.tensorflow.lite.Interpreter;
Interpreter tflite = new Interpreter(loadModelFile());
TensorImage inputImage = TensorImage.fromBitmap(bitmap);

Tensors outputs = Tensors.create(outputBuffer);
tflite.run(inputImage.getBuffer(), outputs.getBuffer().rewind());
 Optimize Performance
Optimize the model for mobile deployment by enabling GPU acceleration, using model
quantization, and reducing the model size if necessary.
Conclusion
Building and developing a custom object detection model with TensorFlow Lite involves
several steps: data preparation, model selection, training, conversion, and deployment. By
following these steps, you can create an efficient object detection model suitable for mobile
and edge applications. Leveraging TensorFlow Lite allows you to deploy powerful machine
learning models with minimal resource usage, providing a smooth and responsive user
experience.
56 | P a g e
22BQ1A6138
Example Code Snippet

Below is a complete example code snippet for converting a trained TensorFlow model to
TensorFlow Lite and using it in a mobile application:
python
# Load the trained model
converter = tf.lite.TFLiteConverter.from_saved_model('exported_model/saved_model')
#Optimize the model for mobile deployment

converter.optimizations=[tf.lite.Optimize.DEFAULT]
# Save the TensorFlow Lite model
57 | P a g e
22BQ1A6138
UNIT:4 Get started with product image search
Understand Product Image Search: Familiarize yourself with the concept of product image
search, enabling users to find similar products using images.
Choose a Framework: Select a deep learning framework like TensorFlow or PyTorch, providing
tools for image similarity search tasks.
Dataset Collection: Gather or create a dataset comprising product images with corresponding
labels or metadata.
Model Selection: Choose a suitable image similarity model architecture such as Siamese Networks
or Triplet Networks.
Model Training: Train the selected model on your dataset, optimizing parameters for effective
product similarity matching.
Evaluation and Deployment: Assess the trained model's performance, and deploy it for real-
world use, integrating it into product search systems or applications.
Module 1: Introduction to product image search on mobile
Product image search on mobile devices revolutionizes the way users interact with e-
commerce platforms by allowing them to search for products using images instead of
traditional text queries. This innovative technology leverages computer vision and machine
learning to identify and match products in images to a database of products. This guide
provides an introduction to the concept, the underlying technologies, and practical steps to
implement a product image search feature in a mobile application.
Understanding Product Image Search

What is Product Image Search?
Product image search is a technology that enables users to take a photo of a product or upload
an image, and the system identifies and retrieves similar products from a database. This type
of search is particularly useful for e-commerce platforms, as it enhances the shopping
experience by allowing users to search for products visually, which is more intuitive and user-
friendly.
Why Use Product Image Search?

 Improved User Experience: Users can easily find products by taking a picture, bypassing
the need to describe the item in words.
58 | P a g e
22BQ1A6138
 Increased Engagement: Visual search can lead to higher user engagement and conversion
rates.
 *Accessibility*: It provides an accessible way for users who may have difficulty with
text- based searches.
 *Competitive Edge*: Implementing advanced search functionalities can give e-commerce
platforms a competitive advantage.
Technologies Behind Product Image Search
Computer Vision
Computer vision is a field of artificial intelligence that trains computers to interpret and
understand the visual world. Using digital images from cameras and videos and deep learning
models, machines can accurately identify and classify objects.

CNNs are a type of deep learning model particularly effective for image recognition tasks.
They are designed to automatically and adaptively learn spatial hierarchies of features from
input images.
59 | P a g e
22BQ1A6138
Feature Extraction
Feature extraction involves identifying key features of an image that can be used to
distinguish different objects. In product image search, features such as edges, textures, and
shapes are extracted and compared against a database of product images.
Image Matching
Image matching involves comparing the features extracted from the query image with those in
the product database. This process identifies the most similar products based on visual
similarity.
Implementing Product Image Search on Mobile Setting Up

the Development Environment
To implement product image search, you'll need a development environment with the following
tools:
 *TensorFlow Lite*: A lightweight version of TensorFlow designed for mobile and

embedded devices.
 *Firebase*: For storing and managing the product image database.
 *Android Studio/Xcode*: For developing mobile applications on Android and iOS.
Building the Model
 Data Collection: Collect a dataset of product images. Ensure the dataset is diverse and
representative of the products you want to search.
 Model Training: Train a CNN model using TensorFlow to recognize and classify the
products. Pre-trained models like MobileNet can be fine-tuned with your dataset.
 Model Optimization: Convert and optimize the trained model for mobile using
TensorFlow Lite.
60 | P a g e
22BQ1A6138
# Load the trained model
model = tf.keras.models.load_model('path/to/your_model.h5')
# Convert the model to TensorFlow Lite
converter =
tf.lite.TFLiteConverter.from_keras_model(model)
# Save the model
With open('model.tflite','wb') as f:
Integrating the Model into a Mobile App

 Loading the Model: Load the TensorFlow Lite model in your mobile application.
java
// Android example
private MappedByteBuffer loadModelFile() throws IOException {

AssetFileDescriptor fileDescriptor = this.getAssets().openFd("model.tflite");
FileInputStream inputStream = new
FileInputStream(fileDescriptor.getFileDescriptor()); FileChannel fileChannel =
inputStream.getChannel();
long startOffset = fileDescriptor.getStartOffset();
long declaredLength = fileDescriptor.getDeclaredLength();
return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
}
 Preprocessing Images: Preprocess the input image to match the input requirements of the
model (e.g., resizing, normalization).
java
// Example for resizing and normalizing an image
Bitmap bitmap = BitmapFactory.decodeStream(inputStream);
Bitmap resizedBitmap = Bitmap.createScaledBitmap(bitmap, 224, 224,
61 | P a g e
true); TensorImage inputImageBuffer = new
TensorImage(DataType.FLOAT32);
inputImageBuffer.load(resizedBitmap);
62 | P a g e
22BQ1A6138
 Running Inference: Pass the preprocessed image to the model and get the output. java
// Running inference
float[][] output = new float[1][10]; // Adjust based on your model's output

shape tflite.run(inputImageBuffer.getBuffer(), output);
 Postprocessing Results: Postprocess the output to identify and retrieve the most similar
products from the database.
java
// Example of interpreting the output
List<Product> similarProducts = getSimilarProducts(output);
Integrating with Firebase

Use Firebase to store and manage your product image database. Firebase's real-time database
and Firestore can be used to store product metadata and images. Firebase Cloud Storage can
handle the storage of the product images.
 Uploading Product Images: Upload your product images to Firebase Cloud Storage.
java
StorageReference storageRef = FirebaseStorage.getInstance().getReference();

StorageReference imagesRef = storageRef.child("images/product_image.jpg");
imagesRef.putFile(imageUri)
.addOnSuccessListener(taskSnapshot -> {
// Image uploaded successfully
})
.addOnFailureListener(exception -> {
// Handle unsuccessful uploads
});
63 | P a g e
22BQ1A6138
 Retrieving Product Metadata: Query Firebase Firestore to retrieve product metadata based
on the results from the image search model.
java
FirebaseFirestore db = FirebaseFirestore.getInstance();
db.collection("products")
.whereEqualTo("product_id", productId)
.get()
.addOnCompleteListener(task -> {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot document : task.getResult()) {
Product product = document.toObject(Product.class);
// Display or process the product
} else {
// Handle error
});
Conclusion
Product image search on mobile devices provides an intuitive and efficient way for users to
find products using images. By leveraging computer vision, deep learning, and tools like
TensorFlow Lite and Firebase, developers can create powerful and responsive product image
search functionalities. This guide provides a comprehensive overview and practical steps to
implement such a feature, ensuring a seamless user experience in mobile applications..
64 | P a g e
22BQ1A6138
Module 2: Build an object detector into your mobile app
Building an object detector into a mobile app involves several

steps, including selecting the appropriate model, preparing the
data, training the model, converting it to a mobile-friendly
format, and integrating it into the app. This guide will
walk you through the process using TensorFlow Lite, a
lightweight library designed for mobile and embedded devices.
Steps to Build an Object Detector for Mobile
 Data Preparation
 Collecting Data
Gather a diverse set of images that contain the objects you

want to detect. Ensure your dataset is large enough to
capture various scenarios and backgrounds.
 Annotating Data
Use annotation tools like LabelImg or RectLabel to draw

bounding boxes around objects and label them. Save
annotations in a format compatible with TensorFlow
(Pascal VOC XML or COCO JSON).
 Organizing Data
Organize your dataset into training and validation sets,
65 | P a g e
22BQ1A6138
typically with an 80/20 split.
dataset/
├ ── train/
└ ── val/
├ ── images/
 Model Selection
Choose a pre-trained model architecture suitable for

mobile deployment. SSD MobileNet and EfficientDet are
popular choices due to their balance between accuracy
and speed.
 Environment Setup

Ensure TensorFlow 2.x is installed. You can install it
via pip:
bash
pip install tensorflow
 Install TensorFlow Object Detection API Clone the
TensorFlow Models repository and install the Object
Detection API:
bash
cd models/research
pip install .
cd object_detection
pip install .
66 | P a g e
22BQ1A6138
 Convert Data to TFRecord Format
TensorFlow Object Detection API uses TFRecord format

for training. Convert your dataset using the provided
scripts or custom scripts.
python
from object_detection.utils import dataset_util
import os
import xml.etree.ElementTree as ET
def create_tf_example(image_path, annotation_path):

with tf.io.gfile.GFile(image_path, 'rb') as fid:
encoded_image = fid.read()
image = tf.io.decode_jpeg(encoded_image)
height, width = image.shape[:2]
tree = ET.parse(annotation_path)
root = tree.getroot()
filename = root.find('filename').text.encode('utf8')
xmin = []
ymin = []
xmax = []
ymax = []
classes_text = []
classes = []
for member in root.findall('object'):

xmin.append(float(member[4][0].text) / width)
ymin.append(float(member[4][1].text) / height)
xmax.append(float(member[4][2].text) / width)
ymax.append(float(member[4][3].text) / height)
classes_text.append(member[0].text.encode('utf8'))
classes.append(1) # Assuming a single class for
simplicity
67 | P a g e
22BQ1A6138
tf_example=
tf.train.Example(features=tf.train.Features(feature={ '
image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename':
dataset_util.bytes_feature(filename),
'image/source_id':
dataset_util.bytes_feature(filename),
'image/encoded':
dataset_util.bytes_feature(encoded_image),
'image/format': dataset_util.bytes_feature(b'jpeg'),
'image/object/bbox/xmin':
dataset_util.float_list_feature(xmin),
'image/object/bbox/ymin':
dataset_util.float_list_feature(ymin),
'image/object/bbox/xmax':
dataset_util.float_list_feature(xmax),
'image/object/bbox/ymax':
dataset_util.float_list_feature(ymax),
'image/object/class/text':
dataset_util.bytes_list_feature(classes_text),
'image/object/class/label':
dataset_util.int64_list_feature(classes),
}))
return tf_example
def create_tfrecord
(output_path,image_dir,annotations_dir):
writer = tf.io.TFRecordWriter(output_path)
for image_file in os.listdir(image_dir):
image_path = os.path.join(image_dir, image_file)
annotation_path = os.path.join(annotations_dir,
68 | P a g e
22BQ1A6138
os.path.splitext(image_file)[0] + '.xml')
tf_example=create_tf_example(image_path,
annotation_path)
writer.write(tf_example.SerializeToString())
writer.close()
create_tfrecord('train.record','dataset/train/
images','dataset’)
create_tfrecord('val.record','dataset/val/images',
'dataset/val/annotations')
 Training the Model
 Set Up the Training Pipeline
Modify the configuration file for the chosen model to

update paths for the training and validation data, label
map, and other parameters.
python
label_map_path: "path/to/label_map.pbtxt"
}
}
label_map_path: "path/to/label_map.pbtxt"
}
}
#### b. Start Training
69 | P a g e
22BQ1A6138
Run the training script with the specified configuration

file.
bash
python model_main_tf2.py --model_dir=training/
-- pipeline_config_path=path/to/pipeline.config
Monitor training progress using TensorBoard.
 Convert the Model to TensorFlow Lite
 Export the Trained Model
After training, export the model using the exporter script:
bash
python exporter_main_v2.py --input_type image_tensor --
pipeline_config_path path/to/pipeline.config --
trained_checkpoint_dir training/ --output_directory
exported_model/
 Convert the Model to TensorFlow Lite

Format
Convert the TensorFlow model to TensorFlow Lite

format:
python
converter =
tf.lite.TFLiteConverter.from_saved_model('exported_mo
d el/saved_model')
70 | P a g e
22BQ1A6138
 Integrate TensorFlow Lite Model into

Mobile App
 Loading the Model
Load the TensorFlow Lite model in your mobile

application.
java
// Android example
private MappedByteBuffer loadModelFile() throws

IOException {
AssetFileDescriptor fileDescriptor=
this.getAssets().openFd("model.tflite");
FileInputStream inputStream=new
FileInputStream(fileDescriptor.getFileDescriptor());
FileChannel fileChannel = inputStream.getChannel();
long startOffset = fileDescriptor.getStartOffset();
Long declaredLength=
fileDescriptor.getDeclaredLength();
return
fileChannel.map(FileChannel.MapMode.READ_O
NLY, startOffset, declaredLength);
}
 Preprocessing Images
Preprocess the input image to match the input
requirements of the model (e.g., resizing, normalization).
71 | P a g e
22BQ1A6138
java
// Example for resizing and normalizing an image
Bitmap bitmap=
BitmapFactory.decodeStream(inputStream);
Bitmap resizedBitmap=
Bitmap.createScaledBitmap(bitmap, 224, 224, true);
TensorImage inputImageBuffer=new
TensorImage(DataType.FLOAT32);
inputImageBuffer.load(resizedBitmap);
 Running Inference
Pass the preprocessed image to the model and get the

output.
java
// Running inference
float[][] output = new float[1][10]; // Adjust based on your
model's output shape
tflite.run(inputImageBuffer.getBuffer(), output);
 Postprocessing Results
Postprocess the output to identify and retrieve the objects

detected.
java
// Example of interpreting the output
List<Object>detectedObjects=getDetectedObjects(output)
;
 Deploying the Application
Deploy the application to a mobile device for testing.

Ensure the app correctly loads the model, processes
images, and displays detected objects.
72 | P a g e
22BQ1A6138
Conclusion
Building an object detector into a mobile app involves data

preparation, model selection, training, conversion to
TensorFlow Lite, and integration into the app. By following
these steps, you can create a powerful object detection feature
for your mobile application, enhancing user experience and
providing advanced functionalities. Leveraging TensorFlow
Lite ensures that your object detection model runs efficiently
on mobile devices, offering a seamless and responsive user
experience.
73 | P a g e
22BQ1A6138
Module 3: Detect object in images to build a visual product search: Android
Visual product search enables users to find products using images instead of text. This
involves detecting objects in images, identifying them, and returning relevant search results.
The process typically includes the following steps:
Image acquisition and preprocessing

Object detection
Feature extraction
Product matching and search

Displaying results
 Image Acquisition and Preprocessing
Image Acquisition: Users capture or upload images using the app's camera or gallery
functionality.
Preprocessing: Preprocessing improves image quality and enhances detection accuracy:
Resizing: Resize images to a standard size to reduce computational load.
Normalization: Adjust brightness and contrast for consistency.
Augmentation: Apply transformations like rotation, flipping, and cropping to create more
training data and improve model robustness.
 Object Detection
Object detection involves locating objects within an image and classifying them. This can be
done using various techniques and models:
 Traditional Methods:
Haar Cascades: Uses predefined patterns to detect objects (less common due to deep
learning advancements).
 Deep Learning Models:
YOLO (You Only Look Once): Fast and accurate real-time object detection.
74 | P a g e
Implementation:
Model Selection: Choose a pre-trained model (e.g., YOLO, SSD) or train a custom model
using labeled datasets.
Integration: Use libraries like TensorFlow Lite, PyTorch Mobile, or OpenCV DNN
module to integrate the model into the Android app.
 Feature Extraction
Extract features from detected objects to create a unique representation for each product.
Techniques include:
Convolutional Neural Networks (CNNs): Extract hierarchical features from images. SIFT
(Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features): Detect and
describe local features in images.
Feature Vector Generation: Convert the detected object into a feature vector, a numerical
representation used for comparison.
 Product Matching and Search
Compare the feature vectors of detected objects with those in a product database to find the
best matches.
75 | P a g e
22BQ1A6138
Methods:
Nearest Neighbor Search: Find the closest match in the feature space using algorithms like
KD-Trees or Ball Trees.
Cosine Similarity or Euclidean Distance: Measure similarity between feature
vectors. Database: Store feature vectors and associated product information in a
searchable database (e.g., Firebase, SQLite).
 Displaying Results
Present search results to users in an intuitive interface.

UI Components:
RecyclerView: Display a list of matched products with images and details.
ImageView: Show the detected object alongside search results.
TextView: Provide product descriptions, prices, and other relevant information.
Enhancements:
Sorting and Filtering: Allow users to sort and filter results based on criteria like price,
relevance, and ratings.
User Feedback: Enable users to provide feedback on the search results to improve accuracy
over time.
Example Implementation
Below is a high-level outline for implementing the visual product search on Android:
Setup Camera and Gallery Access:
java
// Request camera and storage permissions
// Open camera or gallery to capture/select an image

Preprocess the Image:
java
// Resize and normalize the image
// Apply any necessary augmentations

Integrate Object Detection Model:
76 | P a g e
22BQ1A6138
java
// Load the pre-trained model (e.g., TensorFlow Lite model)
// Perform object detection on the image

Extract Features and Match Products:
java
// Extract features from detected objects using CNN
// Compare feature vectors with the product database

Display Search Results:
java
// Use RecyclerView to display matched products
// Show detected object and product details in the UI
Conclusion:
Building a visual product search app involves capturing and preprocessing images, detecting
objects using advanced deep learning models, extracting unique features, matching these
features with a product database, and displaying the results to the user. With the help of pre-
trained models and libraries like TensorFlow Lite and OpenCV, this complex task becomes
manageable, allowing for the creation of powerful and user-friendly visual search
applications on Android.
77 | P a g e
Module 4: Object detection: Static Images
Object detection is a critical technology in computer vision, allowing the identification and
localization of objects within an image. It has a wide range of applications including security,
autonomous vehicles, healthcare, and retail. This document provides an overview of the
fundamental concepts, popular methods, and practical implementation strategies for object
detection in static images.
Key Concepts
Object Detection vs. Image Classification:

Image Classification: Assigns a label to an entire image.
Object Detection: Identifies and localizes objects within an image, typically outputting
bounding boxes and class labels.
Bounding Boxes: Rectangular boxes used to specify the location of objects within an image.
Each bounding box is defined by its coordinates (x, y) of the top-left corner, width, and
height.
Intersection over Union (IoU): A metric used to evaluate the accuracy of an object detector. It
measures the overlap between the predicted bounding box and the ground truth.
Confidence Score: A probability score indicating the likelihood that a detected object belongs
to a particular class.
Popular Object Detection Methods

Traditional Methods:
Haar Cascades: Early method using Haar-like features and a cascade of classifiers for
face detection and other tasks. It’s less effective for complex objects.
Deep Learning-Based Methods:

R-CNN (Region-based Convolutional Neural Networks):
R-CNN: Extracts region proposals and classifies them using CNNs. Accurate but slow due
to multiple stages.
Fast R-CNN: Improves speed by sharing convolutional features and using ROI pooling.
Faster R-CNN: Introduces Region Proposal Networks (RPN) for generating region
proposals, significantly boosting speed and accuracy.
YOLO (You Only Look Once):
Divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell.
Known for real-time performance. Variants like YOLOv3 and YOLOv4 offer improvements
78 | P a g e
in speed and accuracy.
22BQ1A6138
SSD (Single Shot MultiBox Detector):

Detects objects in images using a single deep neural network, eliminating region proposal
steps. Balances speed and accuracy well.
EfficientDet:
Uses a compound scaling method to optimize both network width and depth, achieving
state-of-the-art accuracy with efficient use of resources.
Practical Implementation
Choosing a Framework:
Popular frameworks include TensorFlow, PyTorch, and OpenCV. TensorFlow
provides TensorFlow Object Detection API, while PyTorch offers torchvision.
Model Selection:
Pre-trained models like YOLOv3, SSD, and Faster R-CNN are available and can be
fine- tuned on specific datasets.
Dataset Preparation:
Annotate images with bounding boxes and labels using tools like LabelImg or VoTT. Split the
dataset into training and validation sets.
Training and Fine-tuning:

Load a pre-trained model and fine-tune it on the prepared dataset. This process involves
adjusting hyperparameters like learning rate, batch size, and the number of epochs.
Inference:
Use the trained model to perform object detection on new images. This involves processing
the image, running the model, and interpreting the output.
79 | P a g e
22BQ1A6138
Example Code
Here’s a simplified example of using YOLOv3 with OpenCV in Python for object detection:
import cv2
import numpy as np
# Load YOLO model
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg") layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
image = cv2.imread("image.jpg") height, width = image.shape[:2]
# Prepare input blob
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), swapRB=True, crop=False)

net.setInput(blob)
# Run forward pass
outputs = net.forward(output_layers)
80 | P a g e
22BQ1A6138
# Process detections
boxes, confidences, class_ids = [], [], []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
center_x, center_y = int(detection[0] * width), int(detection[1] * height)
w, h = int(detection[2] * width), int(detection[3] * height)
x, y = center_x - w // 2, center_y - h // 2
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
# Apply non-max suppression

indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5,
0.4) for i in indices:
i = i[0]
x, y, w, h = boxes[i]
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
label = str(classes[class_ids[i]])
cv2.putText(image, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Show image
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Challenges and Considerations
Performance: Balancing accuracy and speed is crucial. Models like YOLO offer real-time
performance but may sacrifice some accuracy compared to more complex models like
Faster R-CNN.
Dataset Quality: High-quality, annotated datasets are essential for training accurate object
detection models.
Hardware Requirements: Training deep learning models can be resource-intensive, often
requiring GPUs for feasible training times.
81 | P a g e
22BQ1A6138
Conclusion:
Object detection in static images is a powerful technology with broad applications. Modern
deep learning-based methods like YOLO, SSD, and Faster R-CNN provide robust solutions
for detecting objects efficiently. Implementing these techniques involves selecting the right
model and framework, preparing datasets, and fine-tuning models to achieve optimal
performance. Despite challenges, advancements in this field continue to improve the accuracy
and speed of object detection systems.
82 | P a g e
Module 5: Object Detection: Live camera
Object detection with a live camera stream is a dynamic and real-time computer vision task
that identifies and locates objects within each frame of the video. This technology has
applications in security surveillance, autonomous vehicles, augmented reality, and
interactive gaming.
Key Concepts
Real-Time Processing: Unlike static image detection, live camera object detection processes a
continuous stream of frames, requiring efficient algorithms to maintain high frame rates.
Frame Rate: The speed at which the system processes video frames, typically measured in
frames per second (FPS). Higher FPS ensures smoother and more real-time detection.
Latency: The delay between capturing the frame and displaying the detection results.
Minimizing latency is crucial for real-time applications.
Popular Object Detection Models for Live
Camera YOLO (You Only Look Once):
Fast and efficient, suitable for real-time object detection.

Processes the entire image in one go, balancing speed and accuracy.
SSD (Single Shot MultiBox Detector):
Detects objects in a single pass through the network.

Provides a good trade-off between speed and accuracy, making it ideal for real-time
applications.
MobileNet-SSD:
Lightweight version of SSD, optimized for mobile and embedded devices.

Reduces computational load, allowing for real-time detection on resource-constrained
hardware.
TensorFlow Lite and PyTorch Mobile:
Frameworks optimized for deploying deep learning models on mobile and edge devices.
Support quantization and acceleration to enhance performance.
Practical Implementation
Hardware Setup:
Use a camera (e.g., webcam, smartphone camera) to capture the video stream.
Ensure a capable processor or GPU to handle real-time processing.
83 | P a g e
22BQ1A6138
Software Requirements:
Install libraries like OpenCV for video capture and display.
Use TensorFlow, PyTorch, or another deep learning framework for running the object
detection model.
Real-Time Object Detection Pipeline:
Capture Frame: Continuously capture frames from the live camera. Preprocess
Frame: Resize, normalize, and prepare the frame for model input. Run Detection:
Pass the frame through the object detection model.
Post-process Results: Extract and filter detection results (bounding boxes, class labels,
confidence scores).
Display Results: Overlay bounding boxes and labels on the frame and display the output.
Example Code
84 | P a g e
22BQ1A6138
Here’s an example using YOLO with OpenCV and a live webcam feed in Python:
python
import
cv2
import numpy as np
# Load YOLO model

net = cv2.dnn.readNet("yolov3.weights",
"yolov3.cfg") layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
classes = open("coco.names").read().strip().split("\n")
# Initialize webcam
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
height, width = frame.shape[:2]
# Preprocess frame
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
# Run forward pass

outputs = net.forward(output_layers)
# Initialize lists for detected objects

boxes, confidences, class_ids = [], [], []
# Process detections
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
center_x, center_y = int(detection[0] * width), int(detection[1] * height)
85 | P a g e
22BQ1A6138
w, h = int(detection[2] * width), int(detection[3] *

height) x, y = center_x - w // 2, center_y - h // 2
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
# Apply non-max suppression

indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5,
0.4) for i in indices:
i = i[0]
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(frame, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0),
2)
# Display frame
cv2.imshow("Live Camera", frame)
# Break loop on 'q' key press

if cv2.waitKey(1) & 0xFF ==
ord('q'): break
# Release resources
cap.release()
cv2.destroyAllWindows
() Enhancements
Optimization:
Model Quantization: Reduce model size and increase inference speed by quantizing the
model to lower precision (e.g., INT8).
Hardware Acceleration: Utilize GPUs or specialized hardware like TPUs for
faster processing.
Multi-Threading:
Implement multi-threading to separate frame capturing, processing, and displaying, reducing
latency and improving FPS.
22BQ1A6138
86 | P a g e
Edge Computing:
Deploy models on edge devices (e.g., NVIDIA Jetson, Google Coral) for real-time detection
without relying on cloud resources.
Challenges and Considerations
Performance: Maintaining high FPS while ensuring accurate detections requires balancing
computational load and model complexity.
Lighting and Motion: Variations in lighting and rapid object movement can affect detection
accuracy. Implementing adaptive thresholding and motion stabilization can mitigate these
issues.
Scalability: Handling multiple camera streams or higher resolution frames requires scalable
solutions and more powerful hardware.
Conclusion
Real-time object detection with a live camera involves capturing video frames, processing
them through efficient object detection models, and displaying results instantly. Using
models like YOLO or SSD and optimizing them for performance is key to achieving smooth
and accurate detections. With advancements in hardware and software, implementing real-
time object detection has become increasingly feasible, opening up numerous practical
applications.
 .
87 | P a g e
22BQ1A6138
UNIT-5: Go Further With Product Image Search
 Advanced Model Architectures: Explore advanced model architectures such as attention mechanisms
or graph neural networks to improve product image search accuracy.
 Fine-Grained Features: Investigate techniques to extract fine-grained features from product
images, enhancing similarity matching capabilities.
 Cross-Modal Learning: Dive into cross-modal learning methods that leverage both image and
text data to enhance product search results.
 Large-Scale Data Collection: Gather larger and more diverse datasets to train models on a wider
range of products and variations.
 Online Learning and Personalization: Implement online learning techniques to continually
update and personalize the search experience based on user interactions and feedback.
 Integration with E-Commerce Platforms: Integrate product image search functionalities
directly into e- commerce platforms, enabling seamless product discovery for users.
Module 1: Call the product search backend from the mobile app
Integrating a product search feature in a mobile app involves setting up communication

between the app and a backend server that handles search queries and returns relevant product
information. This process includes defining the backend API, setting up network
communication in the app, handling responses, and displaying the results to the user.
88 | P a g e
22BQ1A6138
Overview
Backend API Setup:

Define endpoints for search queries.
Implement search functionality on the server.
Ensure secure and efficient communication.
Mobile App Integration:

Set up network communication.
Send search queries to the backend.
Handle the responses and update the UI.
Backend API Setup
Define API Endpoints:
Search Endpoint: A typical endpoint could be /api/search.

Method: POST
Request Body: Contains the search query and any filters.
Response: Returns a list of products matching the search criteria.
Implement Search Functionality:
Use a database to store product information.
Implement search algorithms (e.g., text-based search, image-based search using machine
learning models).
Security:
Implement authentication (e.g., OAuth 2.0, JWT). Ensure data is transmitted over HTTPS.
Example Implementation (Python Flask):
python
from flask import Flask, request, jsonify
from search_engine import search_products # Assume this is a custom search module

app = Flask(_name_)
@app.route('/api/search', methods=['POST'])
89 | P a g e
22BQ1A6138
def search():
data = request.json
query = data.get('query')
filters = data.get('filters', {})
results = search_products(query, filters)

return jsonify(results)
if _name_ == '_main_':
app.run(debug=True)
Search Engine Module (Example):
python
def search_products(query, filters):
# Placeholder for search logic
# Typically involves querying a database and applying filters
products = [
{"id": 1, "name": "Product 1", "price": 10.0},
{"id": 2, "name": "Product 2", "price": 20.0}
# Apply search and filters on the product list

return products
Mobile App Integration
Set Up Network Communication:
Use libraries like Retrofit (for Android), Alamofire (for iOS),
or HttpClient(for cross-platform solutions).

Send Search Queries:
Capture user input (e.g., text query, image). Construct the request and send it to
the backend API.
Handle Responses:
Parse the JSON response from the backend.
Update the UI with the search results.
Example Implementation (Android with Retrofit):
90 | P a g e
22BQ1A6138
Add Dependencies (build.gradle):
gradle
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'
Define API Interface:
java
import retrofit2.Call;
import retrofit2.http.Body;
import
retrofit2.http.POST;
public interface ProductService {

@POST("api/search")
Call<List<Product>> searchProducts(@Body SearchRequest request);
}
Create Models:
java
public class SearchRequest {
private String query;
private Map<String, String> filters;
// Constructor, getters, and setters

}
public class Product {

private int id;
private String name;
private double
price;
// Getters and setters

}
91 | P a g e
22BQ1A6138
Set Up Retrofit Instance:
Java
import retrofit2.Retrofit;
import retrofit2.converter.gson.GsonConverterFactory;
public class ApiClient {

private static final String BASE_URL = "https://example.com/";
private static Retrofit retrofit;
public static Retrofit getRetrofitInstance() {

if (retrofit == null) {
retrofit = new Retrofit.Builder()
.baseUrl(BASE_URL)
.addConverterFactory(GsonConverterFactory.create())
.build();
}
return retrofit;
}
}
Send Request and Handle Response:
java
import android.os.Bundle;
import androidx.appcompat.app.AppCompatActivity;
import android.widget.Toast;
import java.util.List;
import retrofit2.Callback;
import retrofit2.Response;
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
92 | P a g e
22BQ1A6138
ProductServiceproductService=
ApiClient.getRetrofitInstance().create(ProductService.class);
SearchRequest request = new SearchRequest("example query", null);
Call<List<Product>> call = productService.searchProducts(request);

call.enqueue(new Callback<List<Product>>() {
@Override
public void onResponse(Call<List<Product>> call, Response<List<Product>>
response) {
if (response.isSuccessful() && response.body() != null) {
List<Product> products = response.body();
// Update UI with the products
} else {
Toast.makeText(MainActivity.this,"No
productsfound",Toast.LENGTH_SHORT).show();
}
}
@Override
public void onFailure(Call<List<Product>> call, Throwable t) {
Toast.makeText(MainActivity.this, "Error: " + t.getMessage(),
Toast.LENGTH_SHORT).show();
}
});
}
}
Enhancements
Caching:
Implement caching mechanisms to store search results and reduce server load.
Use libraries like Room (for Android) or Core Data (for iOS) for local storage.
Pagination:
Implement pagination in the backend API and app to handle large sets of search results
efficiently.
User Feedback:
93 | P a g e
22BQ1A6138
Allow users to provide feedback on search results to improve the search algorithm over time.
Error Handling:
Implement comprehensive error handling to manage network failures, server errors, and
invalid responses gracefully.
Conclusion:
Integrating a product search backend with a mobile app involves setting up a robust API on
the server side, implementing efficient network communication in the app, and ensuring
seamless interaction between the two. By using modern frameworks and best practices,
developers can create a responsive and user-friendly product search experience in mobile
applications.
94 | P a g e
Module 2: Call the product search backend from the Android app
Integrating a product search feature into an Android app requires setting up communication
between the app and a backend server. This involves creating a well-defined API on the
backend, setting up network communication in the Android app, handling the responses, and
displaying the results to the user. This document provides a step-by-step guide to achieve this.
Backend API Setup
1. Define API Endpoint:

- Search Endpoint: Typically, the endpoint is /api/search.
- Metho*: POST
- Request Body: Contains the search query and optional filters.
- Response: Returns a list of products matching the search criteria.
2. Example Implementation:
Using Flask, you can set up a simple backend with a search endpoint:
python
from flask import Flask, request, jsonify
from search_engine import search_products
# Assume this is a custom search module app = Flask( name )
def search():
data = request.json
if name == ' main ':
app.run(debug=True)
In the search_engine.py module:
python
products = [
]
# Apply search and filters on the product list return products
95 | P a g e
22BQ1A6138
from search_engine import search_products # Assume this is a custom search module
app= Flask( name )
def search():
data = request.json
if name == ' main ':

app.run(debug=True)
In the search_engine.py module:

python
products = [
]
# Apply search and filters on the product list
return products
96 | P a g e
22BQ1A6138
Android App Integration
 Set Up Network Communication:

Use Retrofit, a type-safe HTTP client for Android, to handle network operations.
 Add Dependencies:
Add Retrofit and Gson dependencies to the build.gradle file:
gradle
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
 Define API Interface:

Create an interface that defines the API endpoint and request method:
java
import retrofit2.http.Body;
import retrofit2.http.POST;
public interface ProductService {

@POST("api/search")
Call<List<Product>> searchProducts(@Body SearchRequest request);
}
 Create Models:
Define models for the request and response:
java
public class SearchRequest {
private String query;
private Map<String, String> filters;
public SearchRequest(String query, Map<String, String> filters) {

this.query = query;
this.filters = filters;
}

}
97 | P a g e
22BQ1A6138
public class Product {

private int id;
private String
name; private
double price;

}
 Set Up Retrofit Instance:

Create a Retrofit instance to handle API calls:
java

private static final String BASE_URL =
"https://example.com/"; private static Retrofit retrofit;

.baseUrl(BASE_URL)
.build();
}
return retrofit;
}
}
 Send Request and Handle Response:

Use the API interface to send a search request and handle the response: java
98 | P a g
22BQ1A6138
import android.os.Bundle;
import
androidx.appcompat.app.AppCompatActivity;
import android.widget.Toast;
import java.util.List;
import retrofit2.Callback;
import retrofit2.Response;
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
ProductServiceproductService=
ApiClient.getRetrofitInstance().create(ProductService.class);
SearchRequest request = new SearchRequest("example query", null);
Call<List<Product>> call = productService.searchProducts(request);

call.enqueue(new Callback<List<Product>>() {
@Override
public void onResponse(Call<List<Product>> call, Response<List<Product>>
response) {
if (response.isSuccessful() && response.body() != null) {
List<Product> products = response.body();
// Update UI with the products
} else {
Toast.makeText(MainActivity.this,"No products found",
}
}
@Override
public void onFailure(Call<List<Product>> call, Throwable t) {
Toast.makeText(MainActivity.this,"Error:"+t.getMessage(),
99 | P a g
22BQ1A6138
}
});
}
}
UI Implementation
 Design Layout:
Create a layout file (activity_main.xml) with necessary views (e.g., EditText for search
input, Button for submitting the search, RecyclerView for displaying results).
 Update UI with Search Results:

Use a RecyclerView to display the list of products. Create an adapter for the RecyclerView:
java
Public class ProductAdapter extends
RecyclerView.Adapter<ProductAdapter.ProductViewHolder> {
private List<Product> productList;
public ProductAdapter(List<Product> productList) {

this.productList = productList;
}
@Override
public ProductViewHolder onCreateViewHolder(ViewGroup parent, int viewType) {
View view = LayoutInflater.from(parent.getContext()).inflate(R.layout.product_item,
parent, false);
return new ProductViewHolder(view);
}
@Override
public void onBindViewHolder(ProductViewHolder holder, int position)
{ Product product = productList.get(position);
holder.productName.setText(product.getName());
holder.productPrice.setText(String.valueOf(product.getPrice()));
}
100 | P a
g
22BQ1A6138
@Override
public int getItemCount() {
return productList.size();
}
public static class ProductViewHolder extends
RecyclerView.ViewHolder { TextView productName, productPrice;
public ProductViewHolder(View itemView) {

super(itemView);
productName =
itemView.findViewById(R.id.productName); productPrice
= itemView.findViewById(R.id.productPrice);
}
}
}
 Bind Adapter to RecyclerView:

In MainActivity, bind the adapter to the RecyclerView:
java
RecyclerView recyclerView = findViewById(R.id.recyclerView);
recyclerView.setLayoutManager(new LinearLayoutManager(this));
ProductAdapter adapter = new ProductAdapter(products);
recyclerView.setAdapter(adapter);
Enhancements
 Error Handling:
Implement comprehensive error handling to manage network failures, server errors, and
invalid responses gracefully.
 Caching:
Use Room for local caching of search results to improve performance and reduce network
usage.
101 | P a g
22BQ1A6138
 Pagination:
Implement pagination to handle large sets of search results efficiently.
 Security:
Ensure secure communication by implementing authentication and using HTTPS.
Conclusion :
Integrating a product search backend with an Android app involves setting up a robust API
on the server side, implementing efficient network communication in the app, and ensuring
seamless interaction between the two. By following best practices and using modern
frameworks like Retrofit, developers can create a responsive and user-friendly product search
experience in mobile applications.
102 | P a g
22BQ1A6138
Module 3: Build Visual Product search backend using Vision API Product Search
Overview
A visual product search backend allows users to search for products using images rather than text
queries. This involves using machine learning models and image processing techniques to identify
products in an image. Google's Vision API, particularly the Product Search feature, provides robust
tools to implement this functionality. This document outlines the steps to build such a backend.
Key Components
 Google Cloud Vision API Setup :

- Enable Vision API and set up a Google Cloud project.
- Create and manage product sets and products.
 Backend Server Setup :

- Implement endpoints for image upload and product search.
- Integrate with Google Cloud Vision API for image analysis.
 Database Management :
- Store product information and metadata.
- Manage product sets and indexing for efficient searching.
 Search Functionality :
- Process images to detect and extract product features.
- Match extracted features with stored product data.
 Response Handling :
- Format and send search results back to the client.

- Handle errors and edge cases gracefully.
Google Cloud Vision API Setup
 Enable Vision API :

- Go to the [Google Cloud Console](https://console.cloud.google.com/).
- Create a new project or select an existing project.
Enable the Vision API from the API Library.
103 | P a g
22BQ1A6138
 Create and Manage Product Sets :

- Use the Vision API to create product sets and products, adding reference images for each product.
- Product sets group related products to streamline the search process.
bash
gcloud ml vision product-search product-sets create \
--location=us-west1 \
--product-set-id=my-product-set \
--product-set-display-name="My Product Set"
 Add Products to Product Sets :

- Add individual products to the product set with reference images.
- Define metadata such as labels and categories to enhance search accuracy.
bash
gcloud ml vision product-search products create \
--product-id=my-product \
--product-display-name="My Product" \
--product-category=homegoods
bash
gcloud ml vision product-search reference-images create \
--product-id=my-product \
--reference-image-id=my-ref-image \
--gcs-uri=gs://my-bucket/my-image.jpg
104 | P a g
22BQ1A6138
Backend Server Setup
 Environment Setup :
- Use a web framework like Flask (Python) or Express (Node.js) for server-side logic.
- Set up the environment to handle image uploads and API calls.
python
from flask import Flask, request,
jsonify import google.cloud.vision_v1
as vision import os
app = Flask( name )
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] =
"path/to/credentials.json" client = vision.ProductSearchClient()
 Image Upload Endpoint :

- Create an endpoint to handle image uploads from the client.
- Save the uploaded images temporarily for processing.
python
@app.route('/upload', methods=['POST'])
def upload_image():
if 'image' not in request.files:
return jsonify({'error': 'No image file'}), 400
image = request.files['image']
image_path = os.path.join('uploads', image.filename)
image.save(image_path)
return jsonify({'image_path': image_path}), 200
 Product Search Endpoint :

- Create an endpoint to process the uploaded image and perform product search using Vision API.
python
@app.route('/search', methods=['POST'])
def search_product():
data = request.get_json()
image_path = data['image_path']
results = perform_product_search(image_path)
return jsonify(results), 200
105 | P a g
22BQ1A6138
def perform_product_search(image_path):
with open(image_path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
request = {
'image': image,
'features': [{'type':
vision.Feature.Type.PRODUCT_SEARCH}],
'image_context': {
'product_search_params': {
'product_set': 'projects/my-project/locations/us-west1/productSets/my-product-set',
'product_categories': ['homegoods']
}
}
}
response = client.batch_annotate_images([request])
return parse_results(response)
def parse_results(response):
results = []
for annotation in response.responses[0].product_search_results.results:
product = annotation.product
results.append({
'name': product.display_name,
'score': annotation.score,
'image_uri': product.image_uri
})
return results
106 | P a g
22BQ1A6138
Database Management
 Store Product Information :

- Use a database (e.g., Firestore, MySQL) to store product metadata and references.
- Include fields like product ID, name, category, and reference image URIs.
python
from google.cloud import
firestore db = firestore.Client()
def add_product(product_id, display_name, category, image_uri):
doc_ref = db.collection('products').document(product_id)
doc_ref.set({
'display_name': display_name,
'category': category,
'image_uri': image_uri
})
 Manage Product Sets :

- Store information about product sets to organize products efficiently.
- Keep track of relationships between products and their respective sets.
Search Functionality
 Image Processing :
- Convert uploaded images to a format suitable for the Vision API.
- Handle various image formats and sizes for optimal performance.
 Feature Extraction :
- Use Vision API to extract features from the image.
- Analyze features to match them with stored product data.
 Match Products :
- Use extracted features to search for matching products in the product set.
- Rank products based on similarity scores and other criteria.
Response Handling
 Format Search Results :

- Structure the search results in a user-friendly format.
- Include product details, similarity scores, and reference images.
107 | P a g
22BQ1A6138
 Error Handling :
- Implement robust error handling for API calls and image processing.
- Provide meaningful error messages to the client.
python
@app.errorhandler(Exception
) def handle_exception(e):
response = {
'error': str(e)
}
return jsonify(response), 500
Example Application Flow
 User Uploads Image :

- The user captures or selects an image of the product they want to search.
- The image is uploaded to the backend using the /upload endpoint.
 Backend Processes Image :

- The backend saves the image and initiates a product search.
- Vision API extracts features and matches them with stored products.
 Results Returned to User :

- The matched products are sent back to the client.
- The app displays the search results to the user.
Conclusion :
Building a visual product search backend using the Google Vision API involves setting up a
comprehensive system that handles image uploads, processes images to extract features, and searches for
matching products in a pre-defined product set. By leveraging the power of the Vision API, developers can
create an efficient and accurate product search experience for users. This document provides a detailed
guide to setting up such a backend, covering everything from API integration to
database management and response handling.
108 | P a g
22BQ1A6138
UNIT 6:Go Further With Product Image Classification
 Transfer Learning: Utilize transfer learning techniques to adapt pre-trained models to specific
image classification tasks, leveraging knowledge from large datasets.
 Ensemble Learning: Combine multiple classifiers, such as CNNs and SVMs, using ensemble
methods like bagging or boosting to improve classification accuracy.
 Data Augmentation: Implement advanced data augmentation strategies, including rotation, flipping,
and color jittering, to increase the diversity of the training dataset and enhance model generalization.
 Interpretability: Explore techniques for interpreting and visualizing model predictions to gain insights
into model behavior and improve trustworthiness.
 Explainable AI: Dive into explainable AI methods to understand how models arrive at their
predictions, aiding in model debugging and decision-making processes.
 Domain-Specific Models: Develop domain-specific models tailored to specific industries or

applications, optimizing performance for specialized use cases such as medical imaging or
satellite imagery analysis.
Module 1: Build a Flower Recogniser
A flower recognizer application can identify different types of flowers using machine learning models.
This guide outlines the process of building such an application, focusing on setting up the backend,
training the model, and integrating the system with an Android app for real-time recognition.
Key Components
1. Data Collection and Preprocessing :
 Collect a dataset of flower images.

Preprocess the images for model training.
 Model Training :
Choose a suitable machine learning framework.
Train the model on the preprocessed dataset.
109 | P a g
22BQ1A6138
 Backend Setup :
Set up a server to handle image uploads and model inference.
Integrate the trained model into the server.
 Android App Integration :

Implement the Android app to capture and upload images.
Display recognition results to the user.
Data Collection and Preprocessing
Dataset :
Use a publicly available flower dataset, such as the Oxford 102 Flower Dataset, or create your own by
collecting images of various flower species.
Ensure the dataset is labeled with the correct flower species for supervised learning.
Preprocessing :
Resize images to a uniform size (e.g., 224x224 pixels) to match the input requirements of the
chosen model.
Normalize pixel values to improve model performance.
python
from PIL import Image
import numpy as np
def preprocess_image(image_path):
image = Image.open(image_path).resize((224, 224)) image_array =
np.array(image) / 255.0 # Normalize pixel values return image_array
Model Training
Choose a Framework :
TensorFlow and PyTorch are popular frameworks for image classification tasks.
Model Architecture :
Use a pre-trained model like VGG16, ResNet50, or MobileNetV2 for transfer learning.
Fine-tune the model on the flower dataset.
from tensorflow.keras.applications import MobileNetV2
110 | P a g
22BQ1A6138
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
base_model = MobileNetV2(weights='imagenet', include_top=False) x =
base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(102, activation='softmax')(x) # 102 classes for Oxford 102
model = Model(inputs=base_model.input, outputs=predictions) for
layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
 Training :
- Train the model on the preprocessed dataset.
- Use data augmentation techniques to improve generalization.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)
train_generator = datagen.flow_from_directory(
'path/to/train',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
model.fit(train_generator, epochs=10, steps_per_epoch=100) model.save('flower_recognizer.h5')
111 | P a g
22BQ1A6138
2. Training :
- Train the model on the preprocessed dataset.
- Use data augmentation techniques to improve generalization.
python
from tensorflow.keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator(
rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2,
horizontal_flip=True, fill_mode='nearest'
)
train_generator = datagen.flow_from_directory(
'path/to/train',
batch_size=32,
)
model.fit(train_generator, epochs=10, steps_per_epoch=100)
model.save('flower_recognizer.h5')
Backend Setup
Environment Setup :
Use Flask or Django to set up the backend server.
Load the trained model and set up an endpoint for image prediction.
112 | P a g
22BQ1A6138
Android App Integration:
 Set Up Network Communication :

Use Retrofit for network operations in the Android app.
implementation'com.squareup.retrofit2:retrofit:2.9.0'

private static final String BASE_URL = "http://your-server-url/";
private static Retrofit retrofit;
.baseUrl(BASE_URL)
.build();
}
return retrofit;
}
}
cursor.close();
return path;
}
Conclusion :
Building a flower recognizer involves collecting and preprocessing a dataset of flower images, training a
machine learning model, setting up a backend server to handle image uploads and model inference, and
integrating this functionality with an Android app for real-time recognition.
113 | P a g
22BQ1A6138
Module 2: Create a Custom model for your image classifier

Overview
Building a custom image classifier involves several key steps: collecting and preprocessing data,
designing and training the model, and evaluating its performance. This guide will cover these steps using
TensorFlow and Keras, popular frameworks for deep learning.
Step 1: Data Collection
1. Gather Data:
- Collect a diverse and representative dataset for your classification task. Public datasets like
CIFAR- 10, ImageNet, or custom datasets specific to your problem can be used.
- Ensure the dataset is well-labeled and contains enough samples for each class to avoid bias.
2. Organize Data:
- Split the dataset into training, validation, and test sets. A common split ratio is 70% training,
20% validation, and 10% test.
python
importos
import shutil
from sklearn.model_selection import train_test_split
def split_dataset(data_dir, output_dir, split_ratio=(0.7, 0.2, 0.1)):

class_names = os.listdir(data_dir)
for class_name in class_names:
class_path = os.path.join(data_dir, class_name)
images = os.listdir(class_path)
114 | P a g
22BQ1A6138
train, temp = train_test_split(images, test_size=1 - split_ratio[0])
115 | P a g
22BQ1A6138
it_ratio[1] + split_ratio[2]))
for split, split_name in zip([train, val, test], ['train', 'val',
'test']): split_dir = os.path.join(output_dir, split_name,
class_name) os.makedirs(split_dir, exist_ok=True)
for img in split:
shutil.copy(os.path.join(class_path, img), split_dir)
split_dataset('path/to/data', 'path/to/output')
Step 2: Data Preprocessing
1. Load and Augment Data:

- Use ImageDataGenerator from Keras for data augmentation to increase the diversity of your
training set.
python
from tensorflow.keras.preprocessing.image import ImageDataGenerator train_datagen =
ImageDataGenerator( rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'path/to/output/train',
batch_size=32,
)
val_generator = val_datagen.flow_from_directory(
'path/to/output/val',
batch_size=32,
class_mode='categorical
116 | P a g
22BQ1A6138
)
117 | P a g
22BQ1A6138
Step 3: Model Design
Define the Model:
Create a CNN architecture using Keras. For instance, you can use a simple sequential model or a
more complex architecture with multiple convolutional and pooling layers.
python
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
MaxPooling2D(2, 2),
MaxPooling2D(2, 2),
MaxPooling2D(2,
2), Flatten(),
Dropout(0.5),
Dense(len(class_names), activation='softmax')
])
Compile the Model:

- Compile the model with appropriate loss function, optimizer, and metrics.
python
model.compile
(
loss='categorical_crossentropy',
optimizer='adam',
Step 4: Model Training
 Train the Model:

Train the model using the training and validation generators.
history =
model.fit( train_
generator,
118 | P a g
22BQ1A6138
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=25,
validation_data=val_generator,
validation_steps=val_generator.samples // val_generator.batch_size
)
Save the Model:

- Save the trained model for future use.
python
model.save('flower_classifier.h5'
)
Step 5: Model Evaluation
Evaluate on Test Data:

Evaluate the model's performance using the test dataset.
Python
test_generator = val_datagen.flow_from_directory( 'path/to/output/test',
target_size=(150, 150), batch_size=32, class_mode='categorical')
test_loss, test_accuracy = model.evaluate(test_generator, steps=test_generator.samples //
test_generator.batch_size)
print(f'Test accuracy: {test_accuracy}')
 Visualize Training:
Plot the training and validation accuracy and loss over epochs to check for overfitting.
Python
import matplotlib.pyplot as plt
acc=history.history['accuracy']
val_acc=history.history['val_accuracy']
loss= history.history['loss']
val_loss = history.history['val_loss']
epochs=range(len(acc))
plt.figure(figsize=(12, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs,acc,'b',label='Trainingaccuracy')
plt.plot(epochs,val_acc,'r',label='Validationaccuracy')
plt.title('Training and validation accuracy')
plt.legend()
119 | P a g
22BQ1A6138
plt.subplot(1, 2, 2)
plt.plot(epochs,loss,'b',label='Trainingloss')
plt.plot(epochs,val_loss,'r',label='Validationloss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
Conclusion
Building a custom image classifier involves several crucial steps, starting from data
collection and preprocessing to model design, training, and evaluation. By following
this guide, you can create a robust image classifier tailored to your specific needs,
leveraging powerful tools and techniques from the TensorFlow and Keras
ecosystems. This comprehensive approach ensures that the model is well- optimized,
generalizes well to new data, and provides accurate and reliable prediction
120 | P a g
22BQ1A6138
Module 3: Integrate a custom model into your app
Introduction
In today's digital landscape, integrating custom machine learning models into mobile applications has
become increasingly essential for enhancing user experiences and unlocking innovative functionalities.
This guide provides a comprehensive overview of the process, from model development to seamless
integration, empowering developers to leverage the power of AI within their apps.
Model Development
The first step in integrating a custom model into your app is developing the model itself. This involves
defining the problem statement, collecting and preprocessing data, selecting the appropriate machine
learning algorithms, and training the model. Whether it's image recognition, natural language
processing, or predictive analytics, this section outlines best practices for model development, ensuring
accuracy, efficiency, and scalability.
Model Deployment
121 | P a g
22BQ1A6138
Once the custom model is trained and evaluated, the next step is deploying it within your mobile application.
This section explores various deployment options, including cloud-based solutions, on- device deployment, and
edge computing. Developers will learn how to optimize model performance, minimize latency, and ensure
seamless integration with their app's architecture.
App Integration
The final stage of the process is integrating the custom model into your mobile application. This involves
incorporating the model's functionality into the app's user interface, handling input data, invoking inference
requests, and processing model outputs. From selecting the appropriate frameworks and libraries to
implementing robust error handling mechanisms, this section provides practical insights and code examples to
streamline the integration process.
Conclusion:
By following the steps outlined in this guide, developers can successfully integrate custom machine
learning models into their mobile applications, unlocking new opportunities for innovation and
differentiation in the competitive app market. Whether you're building a personalized recommendation
system, an intelligent chatbot, or a predictive analytics tool, harnessing the power of AI has never been
more accessible or impactful.
122 | P a g
22BQ1A6138
Conclusion
The Google AI-ML virtual internship has been a transformative journey, providing invaluable
insights into the cutting-edge fields of artificial intelligence and machine learning. Through
engaging projects, collaborative discussions, and immersive learning modules, interns have
gained practical skills and hands-on experience essential for tackling real-world challenges.
This internship experience has not only deepened our understanding of AI-ML principles but
also equipped us with the ability to apply these concepts to solve complex problems. Guided by
experienced mentors and surrounded by a diverse community of peers, we have honed our
abilities to push boundaries, challenge assumptions, and drive positive change.
Moreover, the internship has fostered a culture of innovation and collaboration, emphasizing
the importance of curiosity, continuous learning, and a growth mindset. As we conclude this
journey, we carry with us not only technical skills but also a deeper appreciation for the impact
of AI and ML on society.
22BQ1A6138
VASIREDDY VENKATADRI INSTITUTE OF

TECHNOLOGY
(An Autonomous Institution Affiliated to JNTUK, Kakinada
Approved by AICTE New Delhi - Accredited by NBA,
NAAC with ‘A’ Grade and ISO 9001:2008 Certified)
NAMBUR – 522508, Guntur, AP
SUPERVISOR EVALUATIONOF INTERNSHIP RUBRIC
Student Name:
Host Organization/Company:
Internship Supervisor:
Date of Evaluation:
Note: The external assessment evaluated by the committee consisting of HoD, senior faculty,
supervisor concerned and external Examiner. There shall be no internal marks for Summer
Internship.
External Assessment:
Report Preparation: 20 Marks (40%)
Presentation & Viva-Voce: 30 Marks (60%)
: 50 Marks
The purpose of this assessment is to provide the student intern with constructive feedback on
his/her internship experience. This evaluation form should be completed by the internship site
supervisor or the individual who is most responsible for supervisingthe intern’s work assignments.
The student’s grade is partially based on your evaluation of his/her/their performance on each
of the internship dimensions identified below. Use the evaluation rubric to assess the student’s
performance on each dimension by specifying a score based on the performance ratings and
descriptors delineated in the rubric form. Candid and objective comments about the student’s
performance are also appreciated. Please add your relevant comments in the space provided
in the form.
Quality of Work: The degree to which the student’s work is thorough, accurate, and
completed in a timelymanner.
Ability to Learn: The extent to which the student asks relevant questions, seeks out additional
information from appropriate sources, understands new concepts/ideas/work assignments, and is
willing to make needed changes and improvements.
Initiative and Creativity: The degree to which the student is self-motivated, seeks out
challenges, approaches and solves problems on his/her own, and develops innovative and creative
ideas/solutions/options.
22BQ1A6138
Character Traits: The extent to which the student demonstrates a confident and positive attitude,
exhibits honesty and integrity on the job, is aware of and sensitive to ethical and diversity issues,
and behaves in an ethical and professional manner.
Dependability: The degree to which the student is reliable, follows instructions and appropriate
procedures, is attentive to detail, and requires supervision.
Attendance and Punctuality: The degree to which the student reports to work as scheduled and
on-time.
Organizational Fit: The extent to which the student understands and supports the organization’s
mission, vision, and goals; adapts to organizational norms, expectations, and culture; and
functions within appropriate authority and decision-making channels.
Response to Supervision: The degree to which the student seeks supervision, when necessary, is
receptive to constructive criticism and advice from his/her supervisor, implements suggestions
from his/her supervisor, and is willing to explore personal strengths and areas for improvement.
Supervisor Evaluation of Internship – Grading Rubric

Performance Rating
Evaluation
Needs Improvement Meets Expectations Excellent Score
Dimensions
1 2 3 4 5 6
Internship Evaluation Dimensions – Grading Criteria
Work was done in a With a few minor Thoroughly and

careless manner and exceptions, adequately accurately
was of erratic quality; performed most work performed all work
Quality of
work assignments requirements; most requirements;
Work
were usually late and work assignments submitted all work
required review; made submitted in a timely assignments on
numerous errors manner; made time; made few if
occasional errors any errors
Comments:
Asked few if any In most cases, asked Consistently asked
questions and rarely relevant questions relevant questions
sought out additional and sought out and sought out
information from additional additional
appropriate sources; information from information from
Ability to
was unable or slow to appropriate sources; appropriate sources;
Learn
understand new exhibited acceptable very quickly
concepts, ideas, and understanding of new understood new
work assignments; concepts, ideas, and concepts, ideas, and
was unable or work work assignments;
unwilling to assignments; was was always willing
recognize mistakes usually willing to to take
22BQ1A6138
and was not take responsibility responsibility for

receptive to making for mistakes and to mistakes and to
needed changes and make needed make needed
improvements changes and changes and
improvements improvements
Comments:
Had little observable Worked without Was a self-starter;
drive and required extensive consistently sought
close supervision; supervision; in some new challenges and
Initiative showed little if any cases, found problems asked for additional
and interest in meeting to solve work assignments;
Creativity standards; did not and sometimes asked regularly
seek out additional for additional work approached and
work and frequently assignments; solved problems
procrastinated in normally set his/her independently;
completing own goals and, in a frequently proposed
assignments; few cases, tried to innovative and
suggested no new exceed requirements; creative ideas,
ideas or options offered some creative solutions, and/or
ideas options
Comments:
Regularly exhibited a Except in a few Demonstrated an
negative attitude; minor instances, exceptionally
was dishonest demonstrated a positive attitude;
and/or showed a positive attitude; consistently
Character lack of integrity on regularly exhibited exhibited honesty
Traits several occasions; honesty and integrity and integrity in the
was unable to in the workplace; workplace; was
recognize and/or was was usually aware of keenly aware of and
insensitive to ethical and sensitive to deeply sensitive to
and diversity issues; ethical and diversity ethical and diversity
displayed significant issues on the job; issues on the job;
lapses in ethical and normally behaved in always behaved in
professional behavior an ethical and an ethical and
professional manner professional manner
Comments:
Was generally Was generally Was consistently
unreliable in reliable in completing reliable in completing
completing work tasks; normally work assignments;
assignments; did followed instructions always followed
Dependability not follow and procedures; was instructions and
instructions and usually attentive to procedures well; was
procedures detail, but work had careful and extremely
promptly or to be reviewed attentive to detail;
accurately; was occasionally; required little or
careless, and work functioned with only minimum
needed constant moderate supervision supervision
follow-up; required
close supervision
22BQ1A6138
Comments:
Was absent Was never absent and Always reported to
excessively and/or almost always on work as scheduled
was almost always time; or usually with no absences and
Attendance reported to work as
late for work was always on-time
and scheduled, but was
Punctuality always on time; or
usually reported to
work as scheduled
and was almost
always on-time
Comments:
Was unwilling or Adequately Completely
unable to understood and understood and fully
understand and supported the supported the
Organizatio support the organization’s organization’s
nal Fit organization’s mission, vision, and mission, vision, and
mission, vision, goals; satisfactorily goals; readily and
and goals; exhibited adapted to successfully adapted
difficulty in organizational to organizational
adapting to norms, expectations, norms, expectations,
organizational and culture; and culture;
norms, generally functioned consistently
expectations, and within appropriate functioned within
culture; frequently authority and appropriate authority
seemed to disregard decision-making and decision-making
appropriate channels channels
authority and
decision-making
channels
Comments:
Rarely sought On occasion, sought Actively sought
supervision when supervision when supervision when
necessary; was necessary; was necessary; was
Response to unwilling to accept generally receptive to always receptive to
Supervision constructive constructive criticism constructive criticism
criticism and and advice; and advice;
advice; seldom if implemented successfully
ever implemented supervisor implemented
supervisor suggestions in most supervisor
suggestions; was cases; was usually suggestions when
usually unwilling to willing to explore offered; was always
explore personal personal strengths willing to explore
strengths and areas and areas for personal strengths
for improvement improvement and areas for
improvement
Comments:
22BQ1A6138
Evaluator contentment: Based on the student’s overall performance, a rating will be given
Performance Rating
Good Excellent
1 2
Summary Performance Ratings on Internship

Evaluation Criteria Score
(From above)
Quality of Work
Ability to Learn
Initiative and Creativity
Character Traits
Dependability
Attendance and Punctuality
Organizational Fit
Response to Supervision
Evaluator contentment
Total Score
Overall Performance Evaluation of Student Intern

Outstanding Very Good Satisfactory Marginal Unsatisfactory
Comments:
Yes No
we have reviewed this evaluation with the student intern.
Date of Review
If yes, the date of review:
Comments:
Signature of Committee members:
1.
2.
3.
4.

2nd Aiml

Uploaded by

Copyright:

Available Formats

2nd Aiml

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2nd Aiml

Uploaded by

Copyright:

Available Formats

22BQ1A6138

GOOGLE AI-ML VIRTUAL

Department Of Artificial Intelligence and Machine Learning

Department Of Artificial Intelligence and Machine Learning

VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY

I, RACHAKONDA PAVANA SRI here by declare that the course entitled

Date: R.Pavana Sri – 22BQ1A6138

Place: Guntur Signature of the Candidate

VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY

Signature of the Internship Coordinator Signature of the HOD

Mr. K. Balakrishna Dr. K. Suresh Babu

(Asst. Prof., Department Of CSM ) (Prof., Department Of CSM )

Subject: Submission of Internship Report on Google AI-ML Virtual

I am pleased to submit my internship report on “Google AI-ML Virtual

We express our sincere thanks to Dr. Y. Mallikarjuna Reddy, Principal,

Module Module Contents Date Page

Module 2 Get started with object detection

Module 3 Go further with object detection

Module 4 Get started with product image search

Module 5 Go further with product image search

Module 6 Go further with image classification

Edu Skills with VVIT:

UNIT-1: Program neural networks with Tensor Flow

Module 1: The Hello World of Machine Learning

Example Project: Predicting Housing Prices

Step 1: Data Collection

number of bedrooms, and location.

Step 2: Data Preprocessing

# Select features and target variable

Step 3: Model Selection

from sklearn.metrics import mean_squared_error

Module 2: Introduction to Computer Vision

Introduction to Computer Vision

What is Computer Vision?

navigation, and interactive learning.

Key Concepts and Techniques Image

Common techniques include:

-Filtering: Removing noise or enhancing features using convolutional filters.

from matplotlib import pyplot as

 SIFT (Scale-Invariant Feature Transform): Detecting and describing local features.

 HOG (Histogram of Oriented Gradients): Describing the structure or the shape of an

Machine Learning in Computer Vision

Convolutional Neural Networks (CNNs)

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Create a simple CNN model

Conv2D(64, (3, 3), activation='relu'),

Compile the model

Training and Evaluation

from tensorflow.keras.applications import VGG16

# Load pre-trained VGG16 model + higher level layers

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Add custom layers on top of VGG16

Dense(1, activation='sigmoid') # Assuming binary classification

# Freeze the layers of VGG16

# Compile the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Computer Vision is a rapidly evolving field with a wide range of applications

Module 3: Introduction To Convolutions

\[ K = \begin{bmatrix} -1 & -1 & -1 \\ 0 & 0 & 0 \\ 1 & 1 & 1 \end{bmatrix}