Mini - Project - Report (G16)
Mini - Project - Report (G16)
Bachelor of Technology
in
Computer Science and Engineering
by
BONAFIDE CERTIFICATE
BONU CHANDRASEKAR
(U21CS107)
BELLAPU SESHUBABU
(U21CS084)
BANOTH VIKRAM
(U21CS077)
BONDHILA KARTHIK
(U21CS106)
Chennai
18/11/24
iii
ACKNOWLEDMENTS
iv
ABSTRACT
v
TABLE OF CONTENTS
vi
3.2 System Architecture 19
3.2.1 YOLO Customization 22
3.3 Strategy Recommendation Engine 25
3.4 Algorithm Design for move suggestion 26
3.4.1 Hand Evaluation 26
3.4.2 Decision making process 26
3.4.2.1 Simulation of Possible outcomes 26
4. IMPLEMENTATION 27-35
4.1 System setup 27
vii
6. CONCLUSION AND FUTURE SCOPE 42-45
6.1 Conclusion 42
6.2 Future Scope 43
7. APPENDIX 46-50
7.1 Detection Module Source Code 46
7.2 Suggestion Module Source Code 48
REFERENCES 51-52
viii
LIST OF FIGURES
ix
LIST OF TABLES
x
ABBREVIATIONS/ NOTATIONS/ NOMENCLATURE
AI Artificial Intelligence
I/O Input/Output
ML Machine Learning
xi
CHAPTER 1
INTRODUCTION
1.1. Background
The field of computer vision has seen significant advancements with the
introduction of real-time object detection algorithms such as YOLO (You Only Look
Once). These advancements have enabled real-time applications across industries like
healthcare, retail, and gaming. In card games, detecting and classifying playing cards
during live gameplay presents a novel challenge due to the variability in card orientation,
lighting, and partial occlusion.
This project leverages YOLO for detecting playing cards in real-time while integrating a
strategy recommendation engine. By analyzing detected cards, the system aims to offer
actionable suggestions to players, enhancing decision-making and improving game
outcomes.
1.2 Objective
The primary goal of this project is to design and implement a highly effective
system that can accurately detect and classify playing cards in real time. This system will
not only identify the cards but also analyze the gameplay scenario to provide intelligent
and strategic recommendations, thereby enhancing the decision-making process and
overall experience of the players. The focus is on creating a seamless and user-friendly
interface that integrates advanced computational techniques for improved gameplay
efficiency.
A significant aspect of the system is its ability to adapt to and handle variations in
card appearances. These variations include different orientations of cards (rotations), partial
obstructions (occlusions), and changes in lighting conditions, all of which are common in
real-world scenarios. Addressing these challenges is crucial to ensure the system's
reliability and robustness across diverse environments and use cases.
Moreover, the system is designed with a strong emphasis on real-time performance,
ensuring that all processes, from card detection to recommendation generation, are
executed quickly and efficiently. This optimization is aimed at enabling smooth gameplay
on standard consumer hardware without requiring high-end computational resources,
1
making the solution accessible to a broader audience. By achieving these objectives, the
project aspires to set a benchmark in intelligent gaming systems, combining technological
innovation with practical usability.
Playing card detection and strategy recommendation in real-time pose several challenges:
1. Dynamic environmental conditions, such as varying light levels and occlusions.
2. Accurate and fast classification to enable real-time recommendations.
3. Integration of game rules and strategies into a cohesive decision-making engine.
Applications
The system's versatility makes it valuable across various applications. In
recreational gaming, it elevates the experience for casual users by providing real-time
insights and tips, while also appealing to competitive players with its strategic tools. For
tournaments, the technology offers automated card tracking, ensuring unbiased and
accurate gameplay while maintaining transparency and fairness. Additionally, the system
has significant educational applications, helping beginners learn card values,
combinations, and strategic concepts. By simplifying complex aspects of card games, it
serves as an engaging and interactive learning platform.
Technical Advancements
The system utilizes cutting-edge technology to achieve high-speed video processing
and real-time efficiency. The use of the YOLO algorithm ensures robust detection under
diverse conditions, including challenging scenarios like varying lighting, rotated cards, or
2
partially hidden cards. Its design is inherently scalable, allowing for easy adaptation to non-
standard card designs or decks. Furthermore, its architecture is versatile enough to extend
beyond card games, making it applicable to other object detection tasks, such as item
recognition in logistics or retail environments.
Future Extensions
The project envisions several innovative future extensions to broaden its scope and
functionality. One such extension includes support for unique card games like Uno, Tarot,
or collectible card games, catering to a wider audience. The integration of augmented
reality (AR) could transform gameplay by creating interactive and immersive experiences,
such as overlaying virtual strategies or game progressions in real-time. Beyond cards, the
system could expand into other domains, such as tracking components in board games or
applications in robotics, such as object identification or manipulation tasks.
Research Opportunities
The system offers numerous opportunities for research and development.
Advancing the detection accuracy by incorporating state-of-the-art algorithms like
YOLOv8 or transformers could significantly enhance its robustness. Another avenue for
research is the development of advanced AI-driven strategy recommendation systems that
use deep learning and reinforcement learning to provide highly personalized and intelligent
gameplay suggestions. These innovations could redefine the boundaries of how AI supports
both casual and professional gameplay.
Commercial Potential
This technology has vast commercial potential in gaming and beyond. It could be
integrated into gaming devices, online platforms, and competitive tournaments to offer
automated gameplay tracking and assistance. Mobile applications could bring the system
to personal and social gaming, enabling users to enjoy enhanced gameplay on-the-go.
Additionally, licensing opportunities exist for educational tools, where the system could be
used to teach card game mechanics and strategies, and in competitive gaming markets,
where it could serve as a referee or adjudication tool. The technology's scalability and
adaptability make it an attractive solution for businesses looking to innovate in gaming,
education, or even other interactive entertainment industries.
3
1.4 Report Overview
1. Input Module
The system begins with the input module, which captures real-time video
streams from a camera. This module preprocesses the video frames by resizing and
normalizing them to ensure they are optimized for efficient analysis. This step lays
the groundwork for accurate and reliable detection in subsequent stages.
2. Detection Module
3. Classification Module
Once cards are detected, the classification module identifies their rank and
suit. It employs advanced techniques to handle error cases such as misclassification
or partial occlusion. By using confidence thresholds, the system minimizes
inaccuracies and maintains the reliability of the classification process, ensuring that
all detected cards are correctly identified.
5. Output Module
The output module ensures that the system's results are accessible and easy
to understand. It displays the detected cards with their respective labels and real-
time strategy recommendations. User-friendly feedback is provided through visual
4
overlays on the video feed and textual suggestions, making the system intuitive and
effective for both novice and experienced players.
6. System Workflow
Input (Video Stream) → Detection (YOLO) → Classification (Rank/Suit)
Strategy Recommendation → Output.
5
CHAPTER 2
LITERATURE SURVEY
6
2. Title : Accurate Object Detection System on HoloLens Using YOLO Algorithm
Publication Year : 2019
Author : Haythem Bahri, David Krčmařík;,Jan Kočí
Published in : 2019 International Conference on Control, Artificial Intelligence,
Robotics & Optimization (ICCAIRO)
Summery :
The 2019 International Conference on Control, Artificial Intelligence,
Robotics & Optimization (ICCAIRO) was a prominent academic gathering that
focused on the latest advancements and applications in the fields of control systems,
artificial intelligence, robotics, and optimization techniques. Held in Prague, Czech
Republic, the conference provided a platform for researchers, industry
professionals, and academicians to present and discuss cutting-edge research,
innovative solutions, and emerging trends in these interconnected domains. Key
themes included advancements in machine learning algorithms, control theory
applications, robotic system design and implementation, and optimization
methodologies. Papers presented at the conference addressed diverse topics such as
intelligent control systems, autonomous robots, AI applications in industrial
automation, and computational techniques for solving complex optimization
problems. The event fostered interdisciplinary collaboration, emphasizing how AI
and robotics are transforming industries like healthcare, manufacturing, and
transportation. Through workshops, keynotes, and technical sessions, ICCAIRO
2019 served as a hub for exchanging ideas and fostering innovation, contributing
significantly to the growing body of knowledge in these critical areas of technology.
7
surveillance and embedded systems applications. YOLOv8 enhances its
predecessor's capabilities with improvements in multi-scale feature extraction and
prediction. It employs an updated neural network architecture designed to handle
smaller objects and cluttered backgrounds more effectively. This model also
supports flexibility in model size and configurations, catering to both high-end
applications and resource-constrained environments like embedded systems.
8
2.2 Object Detection Algorithm
Object detection algorithms are essential for identifying and localizing objects in
images or video streams. Various methods have been developed, each with its strengths
and weaknesses. In the context of card detection, speed and accuracy are crucial, as the
system needs to process video streams in real time. Below are some of the key object
detection algorithms:
1. Faster R-CNN
Faster R-CNN (Region-based Convolutional Neural Network) is one of the
most accurate object detection algorithms. It operates by first generating region
proposals and then using a CNN to classify and refine these proposals. However,
Faster R-CNN is slower compared to other algorithms, making it less suitable for
real-time applications like card detection. While it excels in terms of accuracy, its
computational cost limits its usability in dynamic and fast-paced environments.
2. SSD (Single Shot MultiBox Detector)
SSD is another popular object detection algorithm that improves upon
Faster R-CNN by performing detection in a single shot. It is faster than Faster R-
CNN, making it more suitable for real-time applications. However, SSD's accuracy
is generally lower than that of Faster R-CNN, particularly when it comes to
detecting small objects, like playing cards, in cluttered environments. SSD achieves
a good trade-off between speed and accuracy, but it may not offer the precision
needed in scenarios where fine details matter.
3. YOLO (You Only Look Once)
YOLO is one of the most efficient object detection algorithms, especially
for real-time applications. Unlike Faster R-CNN and SSD, which focus on region
proposals, YOLO processes the entire image in a single pass. This "single-shot"
detection approach allows YOLO to achieve high speeds, making it ideal for
detecting small objects such as playing cards in fast-moving video feeds. YOLO
achieves a good balance between speed and accuracy, which is crucial in real-time
systems where both quick processing and reliable detection are required.
YOLO’s architecture is optimized for speed and can process images at rates
of up to 30frames per second (FPS) or higher, which is vital for applications like
card detection during a game. Despite its focus on speed, YOLO maintains
reasonable accuracy, even in conditions where objects are partially occluded or
rotated.
9
Table 2.2.1 Comparison of Object Detection Algorithms
10
Crowd Management: In public spaces or large gatherings, object detection can be
used to count people, monitor crowd density, and detect any unusual behavior or
movement patterns, contributing to public safety and better management of space.
11
Gesture Recognition: YOLO is used in gaming systems for recognizing player
gestures and movements, providing more immersive and interactive gameplay. This
can be particularly useful in VR or motion-controlled games where player actions
are tracked to control characters or game elements.
Overall, the figure depicts the architecture of the YOLO (You Only Look Once) model, a
popular object detection algorithm.
Convolutional Layers:
The model starts with a series of convolutional layers. These layers extract
features from the input image. The number of convolutions increases as we move deeper
into the network, indicating the complexity of the features being learned.
12
Additional Observations:
The figure highlights the increasing number of convolutions as we move deeper
into the network. This is a common design pattern in convolutional neural networks.
The YOLO model is known for its speed and accuracy, making it suitable for real-time
object detection applications.
1. Data Diversity
Creating a comprehensive dataset is fundamental for the system's accuracy and
reliability. To account for card variations, the dataset must encompass all possible cards
from different decks, including regional styles and designs. For example, the King of Hearts
might differ not only in its artwork but also in subtle aspects like color tone or texture
between brands. Capturing these differences ensures the model can generalize across decks.
Moreover, the dataset should address orientations and angles since cards can appear in
various positions—upright, tilted, or partially obscured. These variations mimic real-world
gameplay scenarios and help the system perform consistently.
Lighting conditions are another critical factor. In real games, lighting setups can
range from dim indoor environments to natural sunlight outdoors. Shadows, reflections,
and uneven illumination can pose challenges, making it essential to include images under
different brightness and shadow levels in the dataset. Finally, background complexity must
be addressed. Cards may be placed on plain surfaces, like solid-colored tables, or on
cluttered backgrounds, such as patterned tablecloths or environments with overlapping
objects. Including such scenarios enables the model to distinguish cards from their
surroundings effectively, ensuring robust detection in complex settings.
2. Annotation
Annotation quality significantly impacts the model's training and subsequent
performance. Bounding boxes must be accurately drawn around each card to identify its
location in the image. Poor annotation, such as boxes that are too large, too small, or
misplaced, can confuse the detection algorithm, resulting in reduced accuracy during real-
13
time use. Alongside spatial annotations, proper card labeling is essential. Each card must
be labeled with its exact identity, such as “Ace of Spades” or “Jack of Diamonds.”
Mislabeling introduces inconsistencies in the training process, causing the model to
misclassify cards. Ensuring precise and consistent annotations across the entire dataset is
critical to achieving high detection and classification performance.
3. Data Augmentation
To improve the model's robustness and generalization, data augmentation
techniques are employed. Rotation and scaling adjustments simulate cards appearing at
different angles or sizes, which are common during gameplay. These transformations help
the model learn to recognize cards in unconventional orientations. Brightness and contrast
adjustments replicate various lighting environments, enabling the system to remain
effective under both low-light and high-glare conditions.
Noise addition simulates real-world imperfections, such as scratches, glare from
reflective surfaces, or low-resolution images. By introducing such distortions during
training, the model becomes more resilient to these challenges in real scenarios. Occlusion
handling, where parts of a card are hidden by another card or an object, is another key
augmentation technique. This prepares the system to deal with incomplete visibility,
ensuring accurate detection even when only a portion of a card is visible.
14
5. Real-Time Adaptation
Real-time gameplay introduces dynamic challenges that require the dataset to
evolve continuously. Dynamic gameplay data—images captured during actual games—
often vary significantly from controlled training environments. For example, cards may
appear in unexpected positions, lighting setups, or contexts, such as players' hands or
cluttered game tables. Incorporating such images into the dataset ensures the model remains
relevant and effective in practical scenarios.
To maintain long-term system performance, continuous data refinement is
necessary. Regularly updating the dataset with new examples, such as cards under unique
lighting conditions or in new gameplay setups, allows the model to adapt to emerging
patterns and user behavior. This iterative improvement process ensures the system stays
robust and reliable, even as gameplay conditions and user requirements evolve over time.
15
2. Real-Time Constraints
Delivering real-time performance is critical for an engaging and seamless user
experience, especially in gameplay where delays can disrupt the flow. Achieving real-time
processing of video streams and instantaneous suggestions poses significant computational
challenges. To overcome these, model pruning is employed to remove redundant layers and
parameters from the model without compromising its accuracy, reducing computational
overhead and speeding up inference. Additionally, quantization is used to convert model
weights to lower precision, such as from 32-bit to 8-bit, reducing memory usage and
computation time while maintaining acceptable performance levels. These optimizations
enable the system to process inputs quickly and deliver actionable insights without
noticeable delays, ensuring users receive real-time feedback during gameplay.
16
effective as it encounters new patterns and scenarios. These measures ensure the system
remains robust and adaptable to dynamic gameplay environments.
17
CHAPTER 3
DESIGN METHODOLOGY
1. Data Collection
The first step in the process was gathering a diverse and comprehensive dataset to
represent various card orientations and in-game conditions effectively. The objective was
to include all 52 cards from a standard deck, encompassing different suits and ranks, to
ensure the model could accurately detect and classify each one. Images were captured under
a wide range of lighting conditions, including dimly lit environments and bright natural
light, to simulate real-world gameplay scenarios. Additionally, angles and orientations
were varied, with cards appearing upright, tilted, or partially obscured by overlapping
cards. High-resolution images were prioritized to ensure sufficient detail was captured,
which is essential for real-time processing and accurate detection.
2. Annotation Tools
Precise annotation was critical for the success of the model. The positions and
classifications of cards in the collected images were marked using tools like LabelImg,
which facilitated the creation of accurate bounding boxes around each card. Each card was
labeled correctly with its rank and suit, ensuring the dataset was well-organized for training.
Particular attention was given to tilted or partially obscured cards, where precision in
annotation was vital to help the model learn effectively. High-quality annotation minimized
the risk of errors during training and enhanced the overall reliability of the system.
3. Data Augmentation
To improve the diversity of the dataset and enhance the model's generalization
capabilities, a range of data augmentation techniques was applied. These included rotation,
scaling, flipping, and cropping to simulate various orientations and distances, ensuring the
model could recognize cards from multiple perspectives. Additionally, adjustments to
18
brightness and contrast were made to mimic different lighting conditions encountered
during gameplay. Noise was also added to the dataset to simulate real-world disturbances
such as scratches, glare, or low-resolution images. These augmentations significantly
increased the robustness of the model, enabling it to perform well in challenging scenarios.
4. Dataset Splitting
The dataset was divided into three subsets to facilitate effective training, validation,
and testing. Eighty percent of the data was allocated for training, enabling the model to
learn card detection and classification. The remaining data was split equally, with 10% used
for validation to monitor the model's performance during training and 10% reserved for
testing to evaluate its accuracy and robustness after training. Cross-validation techniques
were employed to further enhance performance monitoring and reduce the risk of
overfitting, ensuring the model generalizes well to unseen data.
1. Input Module
The Input Module plays a critical role in capturing live video streams of the game
setup, which are essential for real-time processing and analysis. Using a connected camera
19
or smartphone, this module ensures that the video feed is continuously provided for further
action. One of its key features is the support for various resolutions, allowing it to capture
clear and detailed images of the cards, regardless of lighting conditions. This flexibility
ensures that the system can operate effectively in diverse environments, from dimly lit
rooms to well-lit spaces. The module is also designed to handle diverse angles, allowing it
to detect cards that may be viewed from tilted or partially obscured perspectives. The output
of this module consists of real-time video frames, which are then passed on to the Detection
Module for further processing, ensuring that the system can detect and analyze the cards in
the game without any delays.
3. Recommendation Module
The Recommendation Module is the heart of the system’s strategic analysis
capabilities, providing real-time insights and suggestions to the player based on the
detected cards. This module consists of three key components. The Hand Evaluation
Engine evaluates the player’s hand, identifying potential sets (cards of the same rank) and
runs (consecutive cards of the same suit). This information forms the foundation for the
next component, the Suggestion Engine, which determines the optimal moves for the
player. It recommends which cards to discard, which to retain, and whether it is
advantageous to pick up cards from the discard pile, all tailored to the current state of the
20
game. Finally, the Game Rule Validator ensures that all suggestions comply with the
specific rules of the game (e.g., Rummy), such as having at least one pure run before
declaring victory. The key features of this module include real-time analysis, where
suggestions adapt based on the evolving state of the player’s hand and the game dynamics.
The output of this module is a set of actionable insights provided to the player through a
user-friendly interface, making it easy for them to make informed decisions and improve
their gameplay experience.
Data Flow
1. Video Input: Captured frames are sent to the YOLO-based detection module.
2. Card Detection: Cards are identified, and their details (rank and suit) are
extracted.
3. Analysis and Suggestions: Detected cards are processed to generate
recommendations based on the current hand and game state.
4. User Feedback: The suggestions are displayed to the user for decision-making.
Metric Value
Precision 94%
Recall 92%
F1 Score 93%
Frames Per Second (FPS) 15 – 20 FPS
This table presents the performance metrics of the card detection system used in the
Rummy suggestion system.
Precision: This metric measures the proportion of correctly detected cards out of all cards
predicted by the system. A higher precision indicates fewer false positives (cards
incorrectly identified as cards).
Recall: This metric measures the proportion of correctly detected cards out of all actual
cards present in the image. A higher recall indicates fewer false negatives (cards that were
missed by the system).
F1 Score: This metric combines precision and recall into a single value, providing a
balanced measure of the system's performance. A higher F1 score indicates better overall
performance.
21
Frames Per Second (FPS): This metric measures the speed of the card detection system,
indicating how many frames from the video input can be processed per second. A higher
FPS allows for faster and more responsive suggestions.
3. Data Augmentations
To ensure that the model could handle real-world variations and improve its
resilience to challenging conditions, we applied a variety of data augmentations. These
22
augmentations included rotation, flipping, and scaling, which simulated different card
orientations and perspectives. By artificially creating images where cards are rotated,
flipped, or viewed at various angles, the model learned to detect cards from multiple
viewpoints. Additionally, we introduced brightness and contrast adjustments to simulate
different lighting conditions, as well as occlusion effects where cards are partially hidden
by other objects. These augmentations improved the model’s ability to detect cards under
diverse and unpredictable circumstances, which is common in real-world gameplay
scenarios.
4. Post-Processing Improvements
After detection, the system utilizes post-processing techniques to refine the output
and ensure the best possible results. One key process is the use of Non-Maximum
Suppression (NMS), which is a method used to eliminate overlapping bounding boxes. This
is particularly useful when multiple detections are made for the same card or nearby cards.
NMS filters out all but the most confident prediction, ensuring that only the most accurate
bounding boxes are retained. Additionally, the output labels are enhanced to ensure that the
system correctly identifies the ranks and suits of the detected cards, further improving the
model’s accuracy in real-world conditions.
5. Results
The improvements made throughout the process led to significant gains in model
performance. We achieved high precision and recall, ensuring that the system can reliably
detect cards in various conditions, including difficult scenarios like poor lighting or
overlapping cards. This resulted in improved accuracy in card detection, which is essential
for real-time gameplay recommendations. Furthermore, the optimizations led to reduced
inference time, ensuring that the system can make quick and accurate suggestions during
gameplay without noticeable delays. Overall, the system's ability to detect cards reliably in
varied conditions significantly enhanced the user experience, allowing for seamless
integration into games like Rummy, where real-time card recognition is critical for strategy
development.
23
Fig 3.2.1 System Architecture
The figure depicts the architecture of the card detection system used for Rummy.
Efficient Backbone:
This component processes the input image and extracts relevant features.
It likely employs a pre-trained convolutional neural network (CNN) like ResNet or
EfficientNet.
Decoupled Head:
The decoupled head is responsible for predicting the bounding boxes and class
probabilities of the detected cards. It decouples the regression and classification tasks,
potentially improving the accuracy of the predictions.
U-Up-Sample:
The U-shaped architecture with up-sampling layers helps the model capture
detailed information at different scales. This is important for accurately detecting cards in
complex scenes with varying card sizes and orientations.
Additional Observations:
The use of a pre-trained backbone helps the model learn robust features from a
large dataset of images. The decoupled head architecture is a common technique in object
detection models, as it allows for more focused training on specific tasks. The U-shaped
24
architecture with up-sampling layers is inspired by the U-Net architecture, which is
commonly used for image segmentation tasks.
1. Evaluation Module
The Evaluation Module is responsible for analyzing the player's hand, identifying
its strengths and weaknesses, and providing a detailed assessment to improve gameplay.
Its key function is to detect sets (cards of the same rank) and runs (consecutive cards of the
same suit), which are essential components of many card games like Rummy. The module
goes further by categorizing cards into different types: pure runs (sequences without
jokers), impure runs (sequences with jokers), and loose cards (cards that do not contribute
to any set or run). It also highlights any incomplete runs or sets, helping the player identify
areas where they need to complete their hand. The output of this module is a detailed
analysis of the hand, showing both completed groups (sets and runs) and incomplete ones,
allowing the player to understand their current position and make informed decisions
during the game.
2. Recommendation Module
The Recommendation Module provides real-time suggestions to the player, helping
them make optimal moves based on the current state of the game. This module operates
dynamically, offering advice on whether the player should hold, discard, or draw cards,
depending on the best strategy to improve their hand. The module ranks these suggestions
by their potential to enhance the player's hand, prioritizing the actions that are most likely
to lead to a better outcome. It also makes contextual adjustments, ensuring that
recommendations are tailored to the progress of the game, whether the player is early in
the game or nearing the end. For example, the recommendations might change depending
on the number of cards in play or the player's current position. The output is a set of
actionable recommendations provided in real time, giving the player clear guidance on the
next best move and increasing their chances of winning.
25
3.4 Algorithm Design for Move Suggestions
The algorithm is designed to analyze the player’s hand, predict potential outcomes,
and suggest optimal moves (e.g., discarding, holding, or drawing cards) to improve the
player's chances of winning.
26
CHAPTER 4
IMPLEMENTATION
27
Table 4.1.1 System Requirements
Component Specification
Processor Intel Core i5 or higher
RAM 8 GB or more
Graphic Card NVIDIA GTX 1050 or higher
Operating System Windows 10/Ubuntu 20.04
Software Pycharm, Python, OpenCV, YOLOv4
1. Data Collection:
The process begins with the collection of a large and diverse dataset of playing card
images. This dataset includes various suits, ranks, and orientations of the cards, such as
cards lying flat, angled, or partially obstructed. The images also cover a variety of
lighting conditions, ranging from bright, well-lit environments to dim, shadowed
settings. The inclusion of these variations ensures that the model is trained to detect
cards in diverse real-world environments, accounting for differences in brightness and
visibility.
2. Data Annotation:
The process begins with the collection of a large and diverse dataset of playing card
images. This dataset includes various suits, ranks, and orientations of the cards, such as
cards lying flat, angled, or partially obstructed. The images also cover a variety of
lighting conditions, ranging from bright, well-lit environments to dim, shadowed
settings. The inclusion of these variations ensures that the model is trained to detect
28
cards in diverse real-world environments, accounting for differences in brightness and
visibility.
3. Data Augmentation:
To increase the robustness of the model, data augmentation techniques are applied.
These techniques simulate various real-world conditions that the model might
encounter. For instance, rotation is used to simulate different angles at which the cards
might appear. Scaling adjusts the size of the cards to mimic the effect of distance.
Brightness and contrast adjustments help the model handle different lighting
environments, ensuring it can still detect cards in conditions with varying light
intensity. Additionally, flipping (both horizontal and vertical) is applied to simulate
mirrored card orientations. These augmentations enhance the model’s ability to detect
cards across a wide range of scenarios.
4. Model Configuration:
Once the dataset is prepared, the next step is to configure the model for training.
The YOLO model architecture is selected based on the required balance between
accuracy and speed. Versions such as YOLOv4 or YOLOv5 are commonly chosen for
their good balance of precision and computational efficiency. The configuration files
for YOLO are adjusted to reflect the specific number of card classes (e.g., hearts,
spades, and numbered cards), the input image size, and other hyperparameters to ensure
that the model is optimized for playing card detection.
5. Training Process:
The YOLO model is then trained using a deep learning framework like Darknet,
PyTorch, or TensorFlow. During training, the model learns to adjust its internal weights
through backpropagation, optimizing the loss function—such as mean squared error or
cross-entropy loss—to minimize detection errors. The model is validated during
training using a separate validation set to monitor performance, prevent overfitting, and
ensure the model generalizes well to unseen data.
29
actual game images or videos, to assess how well it performs under practical conditions.
Additionally, post-processing techniques like fine-tuning the Intersection over Union
(IoU) threshold help improve the quality of detection by reducing false positives and
ensuring that the detected cards are accurately labeled.
7. Model Evaluation:
The final step involves evaluating the trained model using standard metrics like
precision, recall, and mean Average Precision (mAP). Precision measures the accuracy
of the detected cards, ensuring that they are correctly identified. Recall ensures that all
the cards present in the image are detected. A high mAP score indicates that the model
performs well overall, accurately detecting and labeling cards in various conditions.
These evaluation metrics help assess whether the model is ready for deployment in real-
time gameplay scenarios.
30
Training Video and Labels:
The training process starts with a video containing the actions to be recognized.
Each frame in the video is associated with a label that indicates the action being
performed.
Configuration File:
The configuration file specifies the architecture of the YOLO model, including the
number of layers, filters, and other parameters.
Additional Observations:
The use of pre-trained weights is a common technique in deep learning to improve
the performance and training speed of models. The YOLO model is known for its speed
and accuracy, making it suitable for real-time object detection applications.
31
4.3.1 Rule-Based Algorithm for Move Suggestions
The algorithm follows a structured approach to analyze the hand:
1. Card Classification:
Once the YOLO detection system identifies and locates the playing cards in each
video frame, the next step is to classify each card based on its suit (hearts, spades, clubs,
or diamonds) and rank (Ace, 2, 3, etc.). This classification is crucial for understanding
the player's hand and evaluating the potential for forming sets and runs. By accurately
classifying each card, the system gains the information necessary to make strategic
decisions regarding the hand.
33
4.3.3 Algorithm Complexity and Optimization
The system employs efficient rule evaluation to optimize the process of determining
valid sets and runs. By utilizing techniques such as sorting and hashing, the system
minimizes processing time, allowing for faster evaluations of the player's hand. This
approach ensures that the system can quickly determine which cards belong to valid sets or
runs and assess the strength of the hand with minimal delay.
Additionally, the system caches results from previous rounds to further enhance
performance. By storing the outcomes of previous evaluations, the system can quickly
retrieve these results without recalculating them every time, which significantly speeds up
the decision-making process. This cached data allows the system to make timely
suggestions without redundant computations.
To achieve real-time performance, the system implements parallel processing.
While YOLO handles the real-time detection and classification of the cards from the
webcam feed, the strategy engine simultaneously processes the hand evaluation and
decision-making. This parallel operation ensures that the system remains responsive,
providing immediate feedback and suggestions as the game progresses without any lag or
delay. This multi-threaded approach is crucial for maintaining a seamless, interactive
gaming experience.
34
4.4.2 Suggestion System Performance
The suggestion engine was thoroughly tested across several key areas to ensure its
effectiveness during gameplay. Accuracy was a primary focus, with the system being
evaluated on its ability to suggest optimal moves, such as whether to discard or hold cards.
The engine consistently made accurate suggestions based on the current hand and game
state. Real-time feedback was another critical factor, and the engine performed well in
providing suggestions promptly, though there were minor delays in more complex
scenarios that required deeper analysis or when the system encountered challenging hand
configurations. Lastly, the engine was tested in complex scenarios, such as when multiple
viable moves existed, where it generally performed well. However, there were instances
where the system struggled to select the most optimal move among several possibilities,
indicating a need for further refinement in handling such situations. Overall, the suggestion
engine showed strong performance, with room for improvement in addressing more
complicated decision-making cases.
35
CHAPTER 5
RESULTS AND DISCUSSIONS
36
5.2 System Performance
The performance of the card detection and suggestion generation system was
evaluated by measuring the response times for two critical components. Card detection was
assessed by the YOLO model’s ability to process video frames at a rate of approximately
15-20 frames per second (FPS). This high processing speed allowed the system to quickly
identify cards during gameplay, ensuring real-time detection even in fast-paced
environments. This performance was crucial for maintaining smooth, uninterrupted
gameplay as the system could detect cards as they appeared in the video feed, adapting
dynamically to the game’s flow.
Suggestion generation was the second key component, evaluated based on how
swiftly the rule-based algorithm could process the player's hand and generate optimal move
recommendations. The system was able to analyze the hand and suggest actions, such as
which cards to discard or hold, in under 2 seconds. This rapid feedback ensured that players
received timely recommendations, keeping the gameplay experience interactive and
responsive, with minimal delays.
The system was also tested in real-time gameplay scenarios to evaluate its ability
to meet the demands of live, fast-paced gaming. During these tests, the system performed
well, maintaining an interactive experience with minimal lag between card detection and
suggestion generation. For complex hands, where multiple viable moves were available,
there was a slight delay (around 2-3 seconds) in generating suggestions. However, this
minor delay did not hinder the system’s ability to provide real-time recommendations,
demonstrating its capacity to handle more complicated situations without compromising
the overall flow of the game. This ensured that players received accurate suggestions, even
in scenarios requiring more intricate decision-making, while still maintaining the seamless
gameplay experience.
37
5.3 Sample Outputs
40
User Interface (UI) clarity:
While the system’s suggestions were generally clear, some users felt that the user
interface (UI) could be more intuitive. They suggested the addition of better visual cues to
indicate the recommended actions, such as more prominent indicators or icons to highlight
which card to discard or hold. Enhancing the UI would help users follow the suggestions
more easily and feel more confident in their decision-making.
Performance optimization:
In certain cases, users reported minor delays in suggestion generation, particularly
during faster-paced rounds. While the system performed well in most scenarios, these
delays could detract from the overall gaming experience, especially when quick decisions
are critical. Performance could be further optimized by refining the response time to
generate suggestions more efficiently, reducing any lag in the decision-making process.
41
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
6.1 Conclusion
Objectives Achieved
This project successfully developed a real-time Rummy game assistant that leverages
YOLO for playing card detection and a rule-based algorithm for generating strategic
suggestions. The primary objectives of the project were:
Card Detection: Accurately identify playing cards from video streams, even under
varying conditions such as different lighting, orientations, and partial occlusion.
Real-Time Strategy Suggestions: Provide immediate, actionable suggestions
based on the player’s current hand, improving decision-making by recommending
optimal moves like holding or discarding specific cards.
Both objectives were met, with the system achieving accurate card detection at 15-20 FPS
and generating strategy suggestions within 2-3 seconds. The system was tested in real-
world scenarios, proving its feasibility for real-time use.
Real-Time Feedback: The system was able to process and display suggestions
quickly, with minimal lag. Although minor delays occurred in complex hands, they
did not significantly hinder gameplay.
42
User Feedback: Players found the system to be helpful in enhancing their strategic
decision-making, though some minor improvements could be made in areas such
as interface clarity, advanced strategy handling, and performance optimization.
3. Multi-Player Support
Currently, the system focuses on a single player’s hand and offers suggestions based
on that. However, Rummy is typically played with multiple players, and expanding the
system to support multi-player scenarios would significantly enhance its usability. Future
43
versions of the system could analyze the game state for all players and provide
comprehensive suggestions on the best strategy based on both the player's and opponents'
hands.
44
7. Expansion to Other Card Games
Beyond Rummy, the core technology of card detection and strategy suggestions can
be adapted for other card games. Expanding the system to support games like Poker,
Bridge, or Solitaire could increase the system’s versatility and appeal.
45
CHAPTER 7
APPENDIX
cap.set(3, 1280)
cap.set(4, 720)
model = YOLO('best.pt')
while True:
success, img = cap.read()
results = model(img, stream=True)
hand = []
for r in results:
boxes = r.boxes
for box in boxes:
46
#bounding box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
#cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
w, h = x2-x1, y2-y1
#print(x1, y1, x2, y2)
cvzone.cornerRect(img, (x1, y1, w, h))
#confidence
conf = math.ceil((box.conf[0]*100))/100
#class Name
cls = int(box.cls[0])
cvzone.putTextRect(img, f'{classNames[cls]} {conf}', (max(0, x1), max(35, y1)),
scale=1, thickness=1)
if conf>=0.5:
hand.append(classNames[cls])
print(hand)
hand = list(set(hand))
print(hand)
if len(hand) >= 4:
results = RummyHand(hand)
suggestions = results.suggest_moves()
print(results)
print(suggestions)
cvzone.putTextRect(img, f'Your Hand: {suggestions}', (100, 75), scale=2,
thickness=3)
cv2.imshow("Image", img)
cv2.waitKey(1)
47
7.2 Suggestion Module Source Code
class RummyHand:
def __init__(self, cards):
self.cards = cards
self.jokers = [card for card in cards if card == "JK"]
self.non_jokers = [card for card in cards if card != "JK"]
self.sorted_hand = self.sort_hand(self.non_jokers)
def form_sequences(self):
sequences = []
used_cards = set()
order = '23456789TJQKA'
48
used_cards = set()
# Calculate the usefulness of the card based on its potential to form sequences and
sets
sequence_potential = 0
set_potential = 0
49
card in remaining_cards]
card_usefulness.sort(key=lambda x: x[1])
def suggest_moves(self):
sequences, remaining_cards = self.form_sequences()
sets, remaining_cards = self.form_sets(remaining_cards)
discard_suggestion = self.suggest_discard(remaining_cards)
return {
"sequences": sequences,
"sets": sets,
"discard": discard_suggestion
}
50
REFERENCES
1. Shindo, T., Watanabe, T., Yamada, K., & Watanabe, H. (2023). Accuracy
Improvement of Object Detection in VVC Coded Video Using YOLO-v7
Features. IEEE International Conference on Artificial Intelligence in Engineering
and Technology, IICAIET 2023, Kota Kinabalu, Malaysia, September 12-14,
2023, 247-251. IEEE. DOI: 10.1109/IICAIET57011.2023.10195363
4. Viswanatha V., Chandana R. K., & Ramachandra A.C. (2022). Real Time Object
Detection System with YOLO and CNN Models: A Review. arXiv:2208.00773.
6. Z. Wu, C. Liu, C. Huang, J. Wen, and Y. Xu, "Deep Object Detection with
Example Attribute Based Prediction Modulation," Proceedings of the 2022 IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP),
pp. 2020–2024, 2022, doi: 10.1109/ICASSP43922.2022.9746194.
51
Intelligence and Communication Technologies (CICT), New Delhi, India, 2024,
pp. 123-130, doi: 10.1109/CICT.2024.10559365.
10. X. Zhang, Y. Li, and H. Wang, "Small Object Detection Using Prediction Head
and Attention," 2024 IEEE International Conference on Image Processing (ICIP),
Singapore, 2024, pp. 567-573, doi: 10.1109/ICIP.2024.10236727
11. D. Liu, M. L. Shyu, Q. Zhu, and S.-C. Chen, "Moving Object Detection under
Object Occlusion Situations in Video Sequences," in Proceedings of the 13th IEEE
International Symposium on Multimedia (ISM), Dana Point, CA, USA, 2011, pp.
271-278, doi: 10.1109/ISM.2011.50.
12. Y. Wang, T. Cai, and O. Zatarain, "Objects Detection and Recognition in Videos
for Sequence Learning," in Proceedings of the IEEE International Conference on
Cognitive Informatics and Cognitive Computing (ICCI-CC), Beijing, China, 2016,
pp. 37-44, doi: 10.1109/ICCI-CC.2016.7785679.
14. X. Wang, Y. Zhao, and L. Li, "MonoDTR: Monocular 3D Object Detection with
Depth-Aware Transformer," in Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV), Montreal, Canada, 2023, doi:
10.1109/ICCV.2023.9879433
15. Wong, A., Shafiee, M. J., Li, F., & Chwyl, B. (2018). Tiny SSD: A tiny single-shot
detection deep convolutional neural network for real-time embedded object
detection. In Proceedings of the 2018 15th Conference on Computer and Robot
Vision (CRV) (pp. 95–101). IEEE.
17. Yuan Liu, Roger Lee, Wei Fang, Fanzhang Li, Shitong Wang, Dongrui Wu,
YingHui Wang, Xiajun Wu, "A New GNN-Based Object Detection Method for
Multiple Small Objects in Aerial Images," Proceedings of the 23rd IEEE/ACIS
International Conference on Computer and Information Science (ICIS 2023),
Wuxi, China, June 23-25, 2023, pp. 14-19, IEEE, 2023. DOI:
10.1109/ICIS57081.2023.00010
18. T. Zhang, M. Liao, and X. Wang, "Detection Continuity Loss for Video Object
Detection," Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2021, pp. 7752–7761, doi:
10.1109/CVPR46437.2021.00763
52