CV Lab 9
CV Lab 9
OBJECTIVES
Object detection is a technique that uses deep neural networks to localize and classify objects in
images. This computer vision task has a wide range of applications, from medical imaging to
self-driving cars. An object detection system will return the coordinates of the objects in an
image that it has been trained to recognize. The system will also return a confidence level, which
shows how confident the system is that a prediction is accurate. Object detection algorithms are
broadly classified into two categories based on how many times the same input image is passed
through a network.
KerasCV is an extension of Keras for computer vision tasks. KerasCV includes pre-trained
models for popular computer vision datasets, such as ImageNet, COCO, and Pascal VOC, which
1
DEPARTMENT OF INFORMATION TECHNOLOGY, QUEST NAWABSHAH
Computer Vision, 21 BS(AI) BATCH
can be used for transfer learning. KerasCV also provides a range of visualization tools for
inspecting the intermediate representations learned by the model and for visualizing the results of
object detection and segmentation tasks.
YOLO model
YOLO (You Only Look Once) is a popular object detection model known for its speed and
accuracy. It was first introduced by Joseph Redmon et al. in 2016. It uses an end-to-end neural
network that makes predictions of bounding boxes and class probabilities all at once. Methods
that use Region Proposal Networks perform multiple iterations for the same image, while YOLO
gets away with a single iteration. Several new versions of the same model have been proposed
since the initial release of YOLO in 2015. The current version of YOLO is YOLOv10 released
on May 23, 2024.
The YOLO algorithm takes an image as input and then uses a simple deep convolutional neural
network to detect objects in the image. The architecture of the CNN model that forms the
backbone of YOLO is shown below.
import os
os.environ["KERAS_BACKEND"] = "jax"
pretrained_model = keras_cv.models.YOLOV8Detector.from_preset(
"yolo_v8_m_pascalvoc", bounding_box_format="xywh")
#filepath = keras.utils.get_file('E:\\edges.jpg')
image = keras.utils.load_img('E:\\object_detection.png')
image = np.array(image)
visualization.plot_image_gallery(
np.array([image]),
value_range=(0, 255),
rows=1,
cols=1,
scale=5,
)
inference_resizing = keras_cv.layers.Resizing(
640, 640, pad_to_aspect_ratio=True, bounding_box_format="xywh"
)
image_batch = inference_resizing([image])
class_ids = [
"Aeroplane",
"Bicycle",
"Bird",
"Boat",
"Bottle",
"Bus",
"Car",
"Cat",
"Chair",
"Cow",
"Dining Table",
"Dog",
"Horse",
"Motorbike",
"Person",
"Potted Plant",
"Sheep",
"Sofa",
"Train",
"Tvmonitor",
"Total",
]
3
DEPARTMENT OF INFORMATION TECHNOLOGY, QUEST NAWABSHAH
Computer Vision, 21 BS(AI) BATCH
class_mapping = dict(zip(range(len(class_ids)), class_ids))
y_pred = pretrained_model.predict(image_batch)
# y_pred is a bounding box Tensor:
# {"classes": ..., boxes": ...}
visualization.plot_bounding_box_gallery(
image_batch,
value_range=(0, 255),
rows=1,
cols=1,
y_pred=y_pred,
scale=5,
font_scale=0.7,
bounding_box_format="xywh",
class_mapping=class_mapping,
)