Nandini Internship Certificate 1
Nandini Internship Certificate 1
Bachelor of Technology
in
Submitted By
Veldi Nandini
21MG1A0595
Professor
Internship Coordinator
i
Affiliated to JNTUK, Kakinada
(2023-24)
Sree Vahini Institute of Science & Technology
Tiruvuru, N.T.R Dist, 521235, AP,India
Affiliated to JNTUK, Kakinada (2023-24)
CERTIFICATE
This is to certify that the Internship work entitled “AI & ML DEVELOPER(MACHINE LEARNING)”
done by Veldi Nandini-21MG1A0595 of computer science and Engineering Department, is a
record bonafide work carried out by him. This Internship is done as a partial fulfilment of
obtaining Bachelor of Technology Degree to be awarded by JNTUK, Kakinada.
The matter embodied in this Internship report has not been submitted to any other university for
the award of any other degree.
ii
ACKNOWLEDGEMENT
I take this opportunity to place on record my heartiest thanks to Dr. R. Nagendra Babu,
Principal of our college, for his guidance and co-operation during our course of study.
Finally, I would like to acknowledge my deep sense of gratitude to the trainer Mr. Repakula
Upendra Rao who helped us directly or indirectly to complete this work.
Veldi Nandini-21MG1A0595
iii
DECLARATION
I hereby declare that the work presented in this Internship titled “AI & ML DEVELOPER (MACHINE
LEARNING)” is submitted towards completion of B. Tech. In-
I have not submitted the matter embodied in this Internship for the award of any other
degree.
Veldi Nandini-21MG1A0595
iv
CONTENTS
v
1. Introduction. 1
2-4
2. Types of Machine Learning.
5- 9
3. Algorithms.
4. Project 10 - 16
5. Code 17- 21
6. References 22 - 22
vi
INTRODUCTION
Machine Learning tutorial provides basic and advanced concepts of machine learning.Our
machine learning tutorial is designed for students and working professionals. Machine
learning is a growing technology which enables computers to learn automatically from past
data. Machine learning uses various algorithms for building mathematical models and making
predictions using historical data or information. Currently, it is being used for various tasks
such as image recognition,speech recognition, email filtering, Facebook auto tagging,
recommender system, and many more. This machine learning tutorial gives you an
introduction to machine learning along with the wide range of machine learning techniques
such as Supervised, Unsupervised, and Reinforcement learning. You will learn about
regression and classification models,clustering methods, hidden Markov models, and various
sequential models.
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines whichwork
on our instructions. But can a machine also learn from experiences or past datalike a human
does? So here comes the role of Machine Learning.
Machine Learning is said as a subset of artificial intelligence that is mainly concernedwith the
development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur Samuel
in 1959.
A machine has the ability to learn if it can improve its performance by gainingmore data.
1
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1. Supervised learning
Supervised learning is a type of machine learning method in which we provide
samplelabeled data to the machine learning system in order to train it, and on that
basis, it predicts the output. The system creates a model using labeled data to
understand the datasets and learn about each data, once the training and processing
are done then we test the model byproviding a sample data to check whether it is
predicting the exact output or not. The goal of supervised learning is to map input data
with the output data. The supervised learning is based on supervision, and it is the same
as when a student learns things in the supervision of the teacher. The example of
supervised learning is spam filtering.
Supervised learning can be grouped further in two categories of algorithms:
• Classification
• Regression
2. UnSupervised learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into
newfeatures or a group of objects with similar patterns. In unsupervised learning, we
don’t have a predetermined result. The machine tries to find useful insights from the
huge amount of data. It can be further classifieds into twocategories of algorithms:
• Clustering
• Association
3. Reinforcement Learning
2
Reinforcement learning is a feedback-based learning method, in which a learning
agentgets a reward for each right action and gets a penalty for each wrong action. The
agentlearns automatically with these feedbacks and improves its performance. In
reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its
performance.
Science and technology have significantly helped the human race to overcome most of its
problems. From making people fly in the air to helping them in managing traffic on roads,
science has been presenteverywhere. Not even a single field is there, where science isn’t
involved. Fromlive-saving machinery to time-saving applications, it is present everywhere.
But, in today’s world, the place where science is involved the most nowadays is in
technology.Every application we have on the phone uses some kind of science. For example,
when a map application tells us the speed of our travel,it simply uses the concept of distance
covered till the point/ time taken. It’s simple science which when combined with technology
gives us all kinds of fruitful results.When it comes to technology and science, we can’t move
aheadwithout talking about the latest technologies available. One of the latest technologies
that has revolutionized the tech worldcompletely is “machine learning”. Let’s start our notch
discussion with machine learning and then dive deep into the binary classification.
Definition of Classification
3
average, strike rate, not outs etc, we can classify him as “in form” or “out of
form”.Classification is the process of assigning new input variables (X) to the class they most
likely belong to, based on a classification model,as constructed from previously labeled
training data.Data with labels is used to train a classifier such that it can performwell on data
without labels (not yet labeled). This process of continuous classification, of previously known
classes, trains a machine. If the classes are discrete, it can be difficult to perform classification
tasks.
1. RECALL
The recall is also known as sensitivity. In binary classification (Yes/No) recall is used to
measure how “sensitive” the classifier is todetecting positive cases. To put it another
way, how many real findings did we “catch” in our sample? We may manipulate this
metric by classifying both results as positive.
2. F1 SCORE
The F1 score can be thought of as a weighted average of precisionand recall, with the
best value being 1 and the worst being 0.Precision and recall also make an equal
contribution to the F1ranking.
• K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the availablecategories.
• K-NN algorithm stores all the available data and classifies a new data point basedon the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
4
• K-NN algorithm stores all the available data and classifies a new data point basedon the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm because it does not learn from thetraining set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets newdata,
then it classifies that data into a category that is much similar to the new data.
• Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we canuse the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the most similar features it
will put it in either cat or dog category.
5
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
Linear regression is one of the easiest and most popular Machine Learning algorithms.It is
a statistical method that is used for predictive analysis. Linear regression makes predictions
for continuous/real or numeric variables such as sales, salary, age, product price, etc.
6
Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linearregression
shows the linear relationship, which means it finds how the value of the dependent variable
is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between
the variables.
Linear regression can be further divided into two types of the algorithm:
• Simple Linear Regression: If a single independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression algorithm is called
SimpleLinear Regression.
• Multiple Linear regression: If more than one independent variable is used to predict the
value of a numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.
• Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predictingthe categorical
dependent variable using a given set of independent variables.
7
• Logistic regression predicts the output of a categorical dependent variable.Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No,0 or 1,
true or False, etc. but instead of giving the exact value as 0 and1, it gives the
probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• Logistic Regression can be used to classify the observations using different typesof data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the "S" form. The S-form curve is calledthe Sigmoid
function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to1, and a
value below the threshold values tends to 0.
8
• Assumptions for Logistic Regression:
9
Project
In an era where technology continually redefines our interactions with the digital world, "Face
Recognition Using OpenCV in Python" stands as a pivotal exploration of the convergence of
computer vision, artificial intelligence, and hands-on programming. This project embarks on
a journey into the realms of face detection, feature extraction, and machine learning, all
within the Python programming language and the OpenCV ( Open Source Computer Vision
Library) framework. The heart of our project lies in the deployment of the Haar Cascade
Classifier, a machine learning-based object detection method, for identifying faces. Theory
meets practice as we explore the intricacies of Haar features, cascades, and the art of
detecting front-facing faces. Hands-on experimentation guides us to optimize parameters,
ensuring the robust detection of faces in varying real-world scenarios.
Introduction
In an age of rapidly advancing technology, the ability to recognize and authenticate individuals
is of paramount importance. Whether for security applications, user identification, or
enhancing user experiences, facial recognition technology has emerged as a powerful tool.
This document explores the realm of "Face Recognition Using OpenCV in Python," shedding
light on the fundamental aspects and implementation details of this fascinating technology.
Background
Facial recognition is the process of identifying or verifying a person’s identity by analyzing and
comparing their facial features with pre-stored data. It has found its way into various real-
world applications, including access control, surveillance, and even social media platforms.
OpenCV, an open-source computer vision library, provides a versatile platform for developing
face recognition systems in Python.
Objectives
10
Face Detection: Utilize OpenCV’s Haar Cascade Classifier to detect faces within images or video
streams.
Front Face Detection: Focus on identifying the front-facing orientation of faces to ensure accurate
recognition.
Front Face Detection: Focus on identifying the front-facing orientation of faces to ensure accurate
recognition.
Python Implementation: Develop the entire system using Python , making it accessible to a wide
range of developers.
Scope
Haar Cascade Classifier: We will delve into the Haar Cascade Classifier, a machine
learning object detection method used to identify objects or features in images. Specifically,
we will apply it to the task of detecting human faces.
Front Face Detection: A crucial aspect of this project is ensuring that the faces we
detect are oriented in a front-facing manner. This involves filtering out nonfrontal views,
which is essential for accurate face recognition.
OpenCV in Python: The entire project will be implemented using Python programming
language and OpenCV libraries, making it accessible and easy to for Python enthusiasts and
developers.
The entire project will be implemented using Python programming language and OpenCV libraries,
making it accessible and easy to for Python enthusiasts and developers.
Methodology
11
In this section, we will walk through the hands-on methodology for implementing "Face
Recognition Using OpenCV in Python" with a specific focus on Haar Cascade Classifier for face
detection and ensuring front face orientation. Follow the steps below to replicate this project:
1. Setting up the Python Environment 1.1. Install Python and OpenCV:
Begin by ensuring you have Python installed on our system.You can download Python from the
official website (https://www.python.org/downloads/) and install it.
Next, install the OpenCV library using pip:
pip install opencv-python 1.2.
Install Additional Libraries
Depending on our specific use case, you may need to install additional libraries for tasks such
as image manipulation, data handling, or machine learning. Common libraries include NumPy,
Pandas, and scikit-learn. Use pip to install them as needed.
2. Data Gathering and Preprocessing
2.1. Data Collection
Collect a dataset of images containing the faces you want to recognize. Ensure the dataset
includes images of people with different facial expressions, lighting conditions, and
orientations.
2.2. Data Preprocessing
Resize the images to a consistent size for efficient processing. Convert images to grayscale for
better feature extraction. Normalize pixel values to a common scale (e.g., 0 to 255).
3. Face Detection with Haar Cascade Classifier
3.1. Load the Haar Cascade Classifier
Load the haar cascade classifier
3.2. Detect Faces
Apply the classifier to our images or video frames to detect faces. Filter out non-frontal views using
the orientation parameter.
4. Face Detection with Haar Cascade Classifier
4.1. Data Labeling
Label the detected faces with the corresponding individuals’ names or IDs.
4.2. Feature Extraction
Use techniques like Principal Component Analysis (PCA) or Local Binary Pattern ( LBP ) to extract
facial features.
4.3. Model Selection
Choose a suitable machine learning model for face recognition. Common choices include
Eigenfaces, Fisherfaces, or deep learning approaches using Convolutional Neural Networks ( CNNs
). 4.4. Model Training
Train our selected model on the labeled dataset, fine-tuning parameters for optimal performance.
12
5. Testing and Validation
5.1. Testing
Evaluate the model’s performance using a separate test dataset or cross-validation.
5.2. Validation
Validate the model in real-world scenarios, assessing its accuracy in recognizing faces from images
or video streams.
13
To identify non-frontal views, the Haar Cascade Classifier employs a parameter known as scale
factor. By adjusting this parameter, you control the degree of scaling applied to the image
during the detection process. A smaller scale factor allows for detecting smaller faces and non
frontal views, while a larger scale factor focuses on larger, frontal faces. Thus, careful tuning
of the scale factor is essential to achieve the desired front face detection results.
In summary, the implementation of face recognition with Haar Cascade Classifier and frontal
face detection involves selecting an appropriate classifier, loading it into the Python
environment, applying it to images or video frames, and fine-tuning parameters like the scale
factor to filter non-frontal views. This combination of theoretical understanding and practical
application forms the foundation of a robust face recognition system, enabling it to perform
reliably in a variety of real-world scenarios.
The Haar Cascade Classifier operates on the principle of feature-based object detection. It
starts with an integral image representation, which significantly speeds up the computation
of rectangular features. These rectangular features are essentially comparisons of pixel sums
within specific regions of the image. The cascade then applies a series of classifiers, each
evaluating a feature, to determine whether a region contains a face or not.
The cascade structure allows for early rejection of non-face regions, making the detection
process efficient.For face detection, it’s essential to choose the right Haar Cascade Classifier.
OpenCV provides a collection of pre-trained classifiers, including those for detecting faces in
various orientations and scales. The "frontal face" classifier is commonly used for detecting
faces with a frontal view. Still, you can also find classifiers optimized for side views, profiles,
and even more specific criteria.
When applying the classifier to an image or video frame, it scans the image at different scales
and positions. This multi-scale approach is necessary because faces can appear in various sizes
and positions within the frame. The scale factor parameter controls how the image is resized
at each stage of the detection process. Smaller scale factors lead to finer-grained detection
but may miss smaller faces, while larger scale factors prioritize larger faces but may not detect
smaller ones. Fine-tuning the scale factor is crucial for achieving the desired level of detection
sensitivity.
Additionally, the Haar Cascade Classifier employs a sliding window approach. It slides
a rectangular window across the image and evaluates the contents of the window using the
classifiers at each position and scale. If a region is classified as a potential face, it proceeds to
14
the next stage. If not, it is rejected. This process continues through multiple stages of the
cascade, with each stage being more selective, effectively filtering out non-face regions.
To ensure that only front-facing views of faces are considered, you can further enhance
the detection process. One common technique is to use the "minNeighbors" parameter,
which specifies how many neighbors a region should have to be considered a face. This can
help filter out false positives and ensure that only strong detections are retained, which
typically corresponds to front-facing faces.
Fine-tuning the parameters of the Haar Cascade Classifier, including the scale factor,
minNeighbors, and the choice of classifier, is often an iterative process. It requires
experimentation and testing on our specific dataset to strike the right balance between
sensitivity and precision.
Libraries
1. pip install opencv-python
Conclusion
In the realm of computer vision and image processing, the pursuit of "Face Recognition Using
OpenCV in Python" has been a journey filled with discovery, challenges, and
triumphs.Through this hands-on project, we have delved into the intricacies of face detection,
feature extraction, and model training, all within the versatile Python environment.
Our project began with the establishment of a robust Python environment, featuring the
installation of OpenCV, a powerhouse library for computer vision tasks. OpenCV’s user
friendly interface and extensive served as the cornerstone of our journey.
We embarked on the exciting phase of data collection and preprocessing, understanding the
importance of diverse and well-prepared datasets in the realm of face recognition. We
harnessed the power of OpenCV to manipulate and transform images, ensuring they were
ready for the subsequent stages of our project.
The utilization of Haar Cascade Classifier for face detection marked a crucial milestone.
15
We explored the theoretical underpinnings of Haar features, cascades, and front face
detection. Through hands-on , we fine-tuned parameters, ensuring the accurate and reliable
detection of faces in real-world scenarios.
Building and training our face recognition model allowed us to delve into the world of
machine learning and feature extraction. We considered various techniques and algorithms,
ultimately crafting a model capable of identifying individuals based on their unique facial
features.Testing and validation reaffirmed the model’s performance and prepared it for
deployment.
Our project resulted in not only a practical implementation but also a deeper
understanding of the complexities of face recognition. We overcame challenges, honed our
skills, and witnessed the power of OpenCV and Python in action.
16
Code
Dataset Creation
Face Detectction
17
18
Training
19
20
Outputs
Creating Dataset
Training
Face Detection
21
Reference
• Javatpoint
• W3chool
• Google Colab
• Jupyter notebook
22