0% found this document useful (0 votes)
3 views24 pages

DIP Project Code

The document outlines a project focused on developing a real-time sign language recognition system that detects five American Sign Language gestures using a standard laptop camera. The system employs digital image processing techniques and deep learning, specifically utilizing the YOLOv5 model for gesture classification, to enhance communication for individuals with hearing or speech impairments. The report details the methodology, challenges faced, and results, emphasizing the system's accessibility and potential for fostering inclusive communication.

Uploaded by

mudassir16alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
3 views24 pages

DIP Project Code

The document outlines a project focused on developing a real-time sign language recognition system that detects five American Sign Language gestures using a standard laptop camera. The system employs digital image processing techniques and deep learning, specifically utilizing the YOLOv5 model for gesture classification, to enhance communication for individuals with hearing or speech impairments. The report details the methodology, challenges faced, and results, emphasizing the system's accessibility and potential for fostering inclusive communication.

Uploaded by

mudassir16alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 24
Sign Language Recognition Digital Image Processing Submitted by: ‘Mudassir Alam 20224503 ‘Manish Goutam 20224093 B.Tech. (Vith Sem) Department of Electronics & Communication Engineering Motilal Nehru National Institute of Technology, Prayagraj, U.P Table Of Content No. Topic Name Page 1 Abstract 03-04 2. 04-06 Introduction 3. 07-09 Literature 4 10-11 Problem Definition 5. 12-15 Methodology 6. 16-20 Code Structure 7. 21-22 Result 8. 23-24 Conclusion 1.Abstract is project presents the design, development, and evaluation of a real-time sign Ianguage recognition system capable of detecting five commonly used American Sign Language (ASL) gestures: Thank You, Yes, No, Please, and I Love You. The system is engineered to operate using only a standard laptop camera, eliminating the need for any specialized or expensive hardware and thereby enhancing its accessibility and ease of deployment. Developed as part of a Digital Image Processing course, this project leverages widely used open-source computer vision tools, specifically OpenCV, in conjunction with a pre-trained deep learning model to accurately classify static hand gestures from live video input. The primary motivation behind this work stems from the significant communication barriers faced by individuals who are hearing or speech-impaired. In many everyday situations, the inability to communicate effectively can lead to frustration, social isolation, and reduced opportunities for participation in educational, professional, or social contexts. By providing a technological solution that can interpret specific sign language gestures in real time and translate them into text, this system aims to foster more inclusive and accessible communication channels between sign language users and the broader community, The methodology adopted in this project encompasses several key stages, including data collection, image preprocessing, pre-trained model, and system integration. Hand gesture data was collected using a webcam, and frames were processed to enhance image quality and isolate relevant features. The deep learning model, trained on a custom dataset of the five selected signs, utilizes convolutional neural networks (CNNs) to extract spatial features from the hand region and accurately classify the gesture. The integration of OpenCV facilitates efficient real-time video capture and processing, ensuring that the system responds promptly to user input. In terms of system design, particular attention was given to user-friendliness and robustness. The interface provides immediate visual feedback by displaying the recognized sign as text on the screen, allowing users to verify system output in real time. The modular architecture of the codebase also allows for future expansion to include additional signs or adapt to different sign languages. ‘This report provides a comprehensive overview of the project, detailing the underlying motivation, technical methodology, system architecture, implementation challenges, and evaluation results. Through rigorous testing, the system demonstrated high accuracy in recognizing the selected signs under various lighting and background conditions. Ultimately, this work highlights the potential of combining digital image processing and deep learning to create practical assistive technologies that can bridge communication gaps and promote greater inclusivity for the hearing and speech-impaired community. 2.Introduction 2.1 Background of the Project Communication is a core component of daily human life, allowing individuals to interact, share ideas, express emotions, and convey information. For individuals with hearing or speech impairments, traditional modes of communication such as spoken language often pose significant challenges. To bridge this gap, sign language has been developed as a rich and expressive alternative communication medium. Using hand gestures, facial expressions, and body language, sign language enables users to communicate complex ideas without speaking. Despite its effectiveness, a key barrier remains — most of the general population is not trained in sign language. As a result, sign language users often experience difficulties in everyday communication, whether in educational settings, professional environments, r social interactions. This communication gap has motivated researchers and developers to seek technological solutions that can serve as real-time interpreters, automatically recognizing and translating sign language into spoken or written language. The growth of digital image processing, combined with advances in machine learning and computer vision, has made real-time sign language recognition more achievable than ever. By using standard webcams, open-source libraries like OpenCV, and deep learning frameworks, it is possible to create lightweight, cost-effective systems capable of identifying specific hand gestures and displaying their meanings in real time. ‘Sign language is a vital communication medium for individuals with speech and hearing impairments. However, the language barrier between sign language users and non-users often leads to communication difficulties. The integration of computer vision and digital image processing has opened avenues for real-time sign recognition systems. This project aims to contribute a simple yet functional model that recognizes a subset of American Sign Language (ASL) gestures and displays the corresponding words on-screen. The focus was on ease of use, minimal hardware requirements (a laptop camera), and implementation of five signs that are among the most frequently used in polite and emotional conversation 2.2 Objective of the Project This project was undertaken with the primary goal of developing a basic yet functional real-time sign language recognition system using digital image processing techniques. Specifically, the system is designed to detect and recognize five frequently used ‘American Sign Language (ASL) signs: © Thank You # Yes * No © Please * Love You These particular signs were chosen due to their importance in polite conversation and ‘emotional expression, making them highly relevant for real-world communication scenarios. The objective was to use a standard laptop camera for video input and train a deep learning model capable of accurately classifying each gesture in live video frames. By limiting the scope to five signs, the focus remained on building a reliable and accessible proof-of-concept system that could be later scaled or improved. ‘The broader goal of the project is to contribute to the field of assistive technology and digital accessibility, offering a solution that could potentially enhance communication between deaf individuals and those unfamiliar with sign language. 2.3 Scope of the Report This lab report documents the entire development process of the Sign Language Recognition system. It covers the technical and conceptual aspects of the project in a structured manner. The report is divided into several key sections, each focusing on a critical aspect of the work: © Literature Review: An overview of existing sign language recognition systems and relevant research, ‘* Problem Definition: A clear outline of the challenges and goals addressed by the project. ‘© Methodology: Detailed explanation of the tools, datasets, and algorithms used. ‘® Design and Implementation: Description of how the system was built, from data collection to real-time prediction, ‘® Results: Performance evaluation of the system based on accuracy and usability. © Conclusion: Summary of findings and suggestions for future improvements. By the end of this report, the reader should have a comprehensive understanding of how digital image processing was applied to recognize sign language gestures in real time, the challenges encountered during development, and the significance of the results obtained. 3.Literature 3.1 Background and Research in Sign Language Recognition Before starting my project, | looked into how sign language recognition has been handled in the past and what approaches are commonly used. | found that earl systems mostly used gloves with sensors or hardware-based tools to track hand motion. Although those systems were accurate, they required users to wear special equipment, which made them inconvenient for everyday use. With the growth of computer vision and deep learning, | noticed that more recent research focused on vision-based systems using just a camera, These systems used machine learning models to recognize hand gestures from video or images without needing any extra devices. Since my goal was to keep things simple and easy to use, especially with a standard laptop camera, | decided to follow the same vision-based approach. | also came across various models like CNNs, ResNet, and MobileNet being used for gesture recognition. But for real-time detection and speed, | felt that | needed something that could both detect and classify in one shot. That’s when I came across YOLO — a powerful family of object detection models. 3.2 Why I Chose YOLOvS for Gesture Recognition ‘Among all the object detection models | researched, YOLOvS stood out because of its speed, simplicity, and real-time performance. | used YOLOvS in my project mainly because | needed a model that could work with live webcam input and give fast and accurate predictions. YOLO (You Only Look Once) models are known for doing detection and classification in one go, making them ideal for tasks like this. YOLOVS, in particular, was easy to set up and work with because it is written in Python using PyTorch, which | was already somewhat familiar with through tutorials. Another reason | chose YOLOVS was because of the available pre-trained weights and the ability to fine-tune the model on my own dataset of hand gestures. Since | only had a few hours to work on this project, training a model from scratch wasn’t possible — YOLOvS's transfer learning capabilities helped me a lot. In my setup, | used YOLOVS to detect hand gestures directly from the video feed. | trained the model to recognize five specific signs — “Thank you,” “Yes,” “No,” “Please,” and “I love you.” YOLOvS handled the task well and was able to detect the gestures in real time while | performed them in front of the camera. 3.3 Role of Python and OpenCV in My Project Python was the main programming language | used for the entire project. One of the biggest reasons | went with Python is because of the huge number of libraries and community support available for computer vision and deep learning. It made the development process smoother and faster. used OpenCV for capturing the video from the laptop camera and for displaying the results. OpenCV also helped with image preprocessing tasks like resizing the frames, drawing bounding boxes, and adding the label of the detected gesture on the screen. It was very straightforward to integrate OpenCV with the YOLOvS model. Other Python libraries | used include: © Numpy, for handling arrays and image data. © PyTorch, which was used to load and run the YOLOvS model. ‘© Matplotlib, which helped me visualize training performance and test results. Thanks to Python's simplicity, | was able to build the project quickly, even though | didn’t have much time or deep experience with machine learning frameworks, 3.4 Using Pre-trained Models to Save Time and Effort Given the time constraint for this project, | knew | wouldn’t be able to collect a huge dataset or train a model from scratch. That’s why I decided to use pretrained weights provided with YOLOvS and fine-tune them using a small dataset of hand gestures that | either collected myself or sourced from open datasets online. ‘The idea behind using a pre-trained model is that it already knows how to detect general shapes and features (like edges, curves, and patterns) from large-scale datasets like COCO. | only had to retrain it slightly on my custom classes (the five hand gestures), which took much less time and gave decent accuracy. Fine-tuning a pre-trained YOLOv5 model allowed me to: © Get results fast, even with a small number of images. '® Avoid the need for powerful GPUs or long training times. ‘® Focus more on the application side — integrating the model with a webcam and making the predictions user-friendly. Overall, using a pretrained model made the entire project more manageable within the short deadline | had. It also showed me how powerful transfer learning can be in real-world tasks like gesture recognition. 10 4.Problem Defin: 4.1 Communication Barriers for the Hearing and Speech Impaired One of the core problems | wanted to address with this project is the communication gap between sign language users and non-signers. People who are hearing or speech impaired often depend on sign language to communicate, but unfortunately, a large part of the population does not understand it. This creates a barrier in daily life situations — whether it’s talking to a shopkeeper, asking for help in public, or participating in a classroom or workplace. While professional human interpreters can help, they are not always available or affordable. | realized that if there could be a system that can recognize sign language using just a webcam, it would help make communication more inclusive and accessible. | wanted to explore how far | could go in building a basic version of that system using just a laptop and open-source tools. 4.2 Need for Real-Time, Easy-to-Use Recognition Systems Another problem | faced early in the planning stage was deciding how to make the system real-time and easy to use. Many gesture recognition systems exist in academic papers or research labs, but they often require expensive hardware like depth cameras, special gloves, or high-end GPUs. | didn’t have access to those things — I only had my laptop and its built-in camera. So the challenge for me became: How do I create a gesture recognition system that works in real time, on a basic laptop, using just a webcam? | needed something that: © Can detect and recognize hand gestures instantly. ‘© Doesn't require training a model from scratch. © Can be built and tested within a few hours or a day. This narrowed my search to real-time object detection models, and that’s when | decided to use YOLOVS, which was light, fast, and could work even with a smaller nu dataset. My goal was to strike a balance between simplicity and functionality — just ‘enough to show that the system works and can recognize at least a few common signs. 4.3 Defining the Scope of the Problem for This Project Since | was short on time and resources, | decided to limit the scope of the project to only five gestures from American Sign Language: © Thank You © Yes © No © Please © Love You These signs were chosen because they are among the most frequently used in daily ‘communication and are also relatively easier to distinguish in terms of hand shape and position. By focusing on just five signs, | was able to: ‘* Collect and label enough data quickly. © Train the model without requiring a large dataset. ‘® Keep the recognition task more manageable. The problem, therefore, became a multi-class image classification task — where each image (or frame from a video) had to be classified into one of five categories. | also needed to make sure that the system works fast enough to give the user immediate feedback, which is critical for real-world usability. 2 5. Methodology 5.1 Overview of the Approach To develop a real-time sign language recognition system, my primary goal was to ensure that the approach was simple, practical, and achievable within the available time and hardware constraints. | opted for a computer vision-based method that relies solely on a webcam feed. The project pipeline was designed around the following key stages: 1. Data Collection and Annotation 2. Model Selection and Pretrained Weights 3. Training using YOLOvS, 4, Integration with Live Webcam Feed 5. Real-Time Detection and Display Each stage played a crucial role in achieving the final outcome. Below, | explain the methodology in detail, breaking down the technical process and the reasons behind each decision. 5.2 Data Collection and Preprocessing Since | only planned to recognize five specific gestures, | decided to either create a small custom dataset myself or source freely available hand gesture images from the web. The goal was to gather images representing the five gestures; © Thank you © Yes © No © Please 2B * [love you | captured these images using my laptop webcam under different lighting conditions and angles. For each gesture, | tried to capture around 100-150 images. Although this is a small dataset by deep learning standards, it was sufficient for a proof-of-concept when paired with transfer learning, used Labellmg, an open-source tool, to annotate the images. Each hand gesture was labeled with the appropriate class name, and the annotations were saved in YOLO format. This tool allowed me to draw bounding boxes around the hand performing the gesture and assign the corresponding class label. Preprocessing Steps: © Resized images to 640x640 pixels (input size required by YOLOvS) ‘© Ensured class balancing to prevent overfitting ‘© Split the dataset into training (80%) and v. lation (20%) 5.3 Model Selection and YOLOvS Overview After evaluating different object detection models, | selected YOLOVS for its lightweight architecture, speed, and ease of deployment. YOLO (You Only Look Once) is a real-time object detection system that can predict bounding boxes and class labels in a single forward pass. YOLOVS is implemented in Python using the PyTorch framework, which made it ideal for quick setup and integration. | used the YOLOvSs (small) variant to ensure faster inference on my laptop. Key Features of YOLOVS: © High inference speed * Easy to train on custom data ‘© retrained on the COCO dataset ‘© Export support for ONNX, TorchScript, and CoreML By using the pretrained weights from YOLOvS, | was able to take advantage of the u model's ability to recognize low-level image features, then fine-tune it on my small dataset of hand gestures. 5.4 Transfer Learning and Training Process Transfer learning was the backbone of this project. Since | didn't have the computational power or time to train a large-scale model from scratch, | fine-tuned YOLOVS's pretrained model on my gesture dataset. Setup: ‘© Model: YOLOvSs (small variant) ‘© Framework: PyTorch ‘® Epochs: 50 © Batch size: 16 © Image size: 640x640 © Optimizer: SGD © Loss functions: Classification loss, Objectness loss, Localization loss used the command line training interface provided by YOLOvS. The model was trained using a single GPU on Google Colab, which helped reduce local resource consumption During training, | monitored: ‘© mAP (mean Average Precision): to measure detection accuracy ‘© Loss curves: to track overfitting or underfitting © Validation images: to visually check prediction results 5.5 Integration with Webcam and Real-Time Inference After training the model, the next step was to integrate it with a live webcam feed so that it could detect and classify gestures in real time. | used OpenCV for this purpose.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy