B Tech Final
B Tech Final
B Tech Final
Submitted by
2019
CERTIFICATE
Certified that this is a bonafide record of the project work titled
done by
Akansh Kumar
Meshram Rohit Ajit
Patan Daen Khan
Tripathi Sandilya Ashutosh
of eighth semester B. Tech in partial fulfillment of the requirements for the
award of the degree of Bachelor of Technology in Computer Science and
Engineering of the National Institute of Technology Calicut
I hereby declare that the project titled, Driver drowsiness detection us-
ing behavioural measures and machine learning techniques, is my
own work and that, to the best of my knowledge and belief, it contains
no material previously published or written by another person nor material
which has been accepted for the award of any other degree or diploma of
the university or any other institute of higher learning, except where due
acknowledgement and reference has been made in the text.
Place : Signature :
Date : Name :
Reg. No. :
iii
Abstract
We would like to thank our guide Dr Jay Prakash for always supporting
and mentoring us. My thanks and appreciations also go to my colleagues
in developing the project. It would not have been possible without the kind
support and help of these people.
i
Contents
1 Introduction 2
2 Literature Survey 3
2.1 Integral Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Haar Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Facial Landmarks . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Eye Aspect Ratio (EAR) . . . . . . . . . . . . . . . . . . . . . 10
2.7 Mouth Aspect Ratio (MAR) . . . . . . . . . . . . . . . . . . . 11
2.8 Head Tilt Estimation . . . . . . . . . . . . . . . . . . . . . . . 11
3 Problem Definition 13
4 Methodology 14
4.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Overview of Design . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Detection algorithm . . . . . . . . . . . . . . . . . . . . 16
4.1.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.4 Face detection . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.5 Eye detection . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.6 EAR (Eye Aspect Ratio) . . . . . . . . . . . . . . . . . 17
4.1.7 Blink detection . . . . . . . . . . . . . . . . . . . . . . 17
4.1.8 MAR (Mouth Aspect Ratio) . . . . . . . . . . . . . . 18
4.1.9 Head Tilt . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.10 Detection System . . . . . . . . . . . . . . . . . . . . . 20
4.2 Work Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.1 Viola Jones Algorithm Using Haar classifier in OpenCV 22
4.2.2 Dataset information . . . . . . . . . . . . . . . . . . . 24
4.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . 24
ii
CONTENTS iii
5 Results 33
References 41
List of Figures
5.1 Confusion matrix for the haar cascade tested on positive and
negative images . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 figure showing computation time of ’haar-cascade-frontalface-
default.xml’ and our haar cascade ’cascade.xml’ . . . . . . . . 34
iv
LIST OF FIGURES v
1
Chapter 1
Introduction
2
Chapter 2
Literature Survey
3
CHAPTER 2. LITERATURE SURVEY 4
the eyes can be said as relatively stable feature on the face in comparison
with other facial features.
Using Support Vector Machine(SVM), various states of drowsiness have
been detected in the past.OpenCV is used to implement computer vision al-
gorithms that gives optimum results. Haar Classifiers based on Viola Jones
algorithm is one of the widely accepted way to detect various objects. Mul-
tilayer Perceptron algorithm was popularly used in a number of detection
methods for various entities [1].
Nowadays, robust real time facial landmark detectors capture most of the
characteristic points on a human face image, including eyelids and eye cor-
ners. These are trained on in-the-wild datasets and thus are robust to varying
facial expressions,illumination and moderate non-frontal head rotations. In
real time applications like biometric and surveillance, the pose variations and
camera limitations make the detection of human faces in feature space more
complex than that of frontal faces. It further complicates the problem of
robust face detection. Paul Viola and Michael Jones presented a fast and ro-
bust method for face detection which is 15 times quicker than any technique
at the time of release with 17 frames per second giving 95 percent accuracy
[3]
High detection rates are achieved by Viola Jones. Images get processed
faster and rapidly.Real time systems use it mostly because only on the present
single grayscale image [4].The main sections of viola jones is listed below:
Figure 2.1: Integral image at location of x, y contains the sum of the pixel
values above and left of x, y, [9] inclusive:
CHAPTER 2. LITERATURE SURVEY 6
Haar like features used over raw pixel values reduce/increase the in-
class/out-of-class variability, and that makes the classification easy. Fig 2.2
shows the commonly used rectangular features.
2.3 Classifier
Value of Haar feature is calculated using the integral of rectangle. Many Haar
classifiers make a stage. A stage accumulator sums all the Haar classifier
results and a stage comparator compares against a stage threshold.Weak
classifiers reject the region where the probability of finding face is less. The
features and their threshold are selected in training phase. Training is done
with AdaBoost algorithm. The threshold is a constant returned from the
AdaBoost algorithm. Haar features, Depending on the parameters of the
training data individual stages can vary in number. Many Haar Feature
classifiers make up a stage. [4]
CHAPTER 2. LITERATURE SURVEY 7
2.4 Cascade
The cascade discards images by making each subsequent stage more hard for
a candidate image pass. A Frame or an image exits the cascade if they pass
all stages or fail any stage.[4] If an image passes all the stages, that means a
face is detected.
scaling of the image and in-plane rotation of the face.the EAR of both eyes
is averaged since eye blinking is performed by both eyes synchronously.[10].
Problem Definition
The overall human facial landmarks while working can be detected, under ve-
hicular driving scenarios. Regular and real time monitoring of people/drivers
is carried out. The goal of the project is analysis by regularly monitoring
the driver/user’s concentration/attention/drowsiness, etc. while driving and
based on the observation sound an alarm .
13
Chapter 4
Methodology
4.1 Design
4.1.1 Overview of Design
Speedy detection and processing of data is our main aim. Frame count of
closed eyes is kept. If the frame count goes above the threshold, then a
alert message is displayed showing that the driver is feeling drowsy. First
image is collected.To detect the faces in each individual frame our trained
Haar Cascade file(classifier/cascade.xml) is used. We move on next frame
if no face is found. [6] The region of interest marked after facial region
are detected ,contains the eyes and mouth. ROI substantially reduces the
computational requirements and eases the way for facial landmarks.
After finding the face and eye our progress is dependent on facial features
like the corner of the eyes, eyebrows etc.To find the landmarks on the faces,
a shape estimator was used that is implemented in the dlib library based on
the papers [12].After detecting the landmarks are converted into coordinates.
Along with landmarks we use head position and tilt to detect drowsiness.
Head position is observed with respect to the calibrated camera’s central
axis.
14
CHAPTER 4. METHODOLOGY 15
Training of the data set using machine learning algorithm is the first
step in voila Jones algorithm. Positive and Negative images are used for
training.To develop predictive relationship between the data sets training is
done . In Viola-Jones algorithm requires Haar-like features to be organized
as classifier cascade. [8]The main advantage of a Haar-like feature is its
calculation speed. AdaBoost Algorithm [5] is used for training the data set
and selecting the required features.
AdaBoost finds a few good features which have significant variety. Dataset
is trained after detemining the features.
4.1.3 Input
Real-Time Video Feed from Web Cam is currently being given as input to the
application having Detection Algorithm (described below) to identify Facial
Region.
landmarks which is shown in Fig. 2.3. Now from the landmarks of the eyes,
the eye landmarks distance is calculated.
In Fig. 4.3, the eye landmarks as A, B, C, D, E, F are indicated. Then,
the horizontal distances between A, E and B, D are calculated. Then we
calculated the vertical distance between F, C. The distances between these
landmarks are calculated from
Expected Output
Creating Samples
Negative Images
Negative images are those images which does not contain the object. These
images should not contain the object that we are trying to detect (face).
Negative samples are enumerated in a special file. it contains the image file
name (directory of the description file) of negative images.Note that negative
samples and sample images are also called background samples images or
background samples. One index file, with list of image filenames is created
(one per line)and these images are put into directories.
Positive Images
Positive images defines what the model should actually look for when trying
to find your objects of interest(faces) by the boosting process. It supports
two ways of generating a positive sample dataset:
1) We can generate a lot of positive images from a single positive image.
2) We can insert the positive images using an annotation tool to crop and
resize the images and putting them in opencv format(binary).
Although first case works well for fixed objects(logo or a sign), it fails for
objects that are less rigid (like faces).Second approach should be used in that
case .As we want a flexible model, we take samples that covers a wide variety
that can be in our class.In the case of faces you should consider different race
groups, face complexion , beard styles and faces . Same applies for second
approach.
Annotation Tool
For generating the info file, OpenCv community has provided with an anno-
tation tool. The tool can be used by the command opencv-annotation if the
OpenCV applications where build.
Example of usage: opencv-annotation - annotations=/path/to/annotations/file.txt
- images=/path/to/image/folder/
CHAPTER 4. METHODOLOGY 24
This command will fire up a window containing the first image and your
mouse cursor which will be used for annotation.Basically there are several
keystrokes that trigger an action. The left mouse button is used to select the
first corner of your object, then keeps drawing until you are fine, and stops
when a second left mouse button click is registered. Finally you will end up
with a usable annotation file that can be passed to the -info argument of
opencv-createsamples.
4.2.3 Training
After the datasets is created we can now train classifier.Funtions and com-
mands for training in OpenCv documentation.
As opencv-traincascade finishes , our trained new cascade is kept in cas-
cade.xml file in the folder, passed as the parameter in the command.all the
different cascade files are created for the case of interrupted training.
Parameters used- minHitRate= 0.98, Number of positive images= 1800,
Number of negative images=900, Pixel width= 50px, Pixel height= 50px.
Training the cascade classifier took 4 days, 12 hours, 16 mins and 50
seconds.
CHAPTER 4. METHODOLOGY 25
Figure 4.5: One of the images from labeled faces in the wild(lfw)
using equation mentioned above and then the average EAR is calculated
Here, L is representing the eye landmarks distance. Now the eye landmarks
distance of the left and right eye is calculated. For the left eye, the distance
is l.l and l.r for the right eye. From these data, mean eye landmarks distance,
L is calculated from
Similar calculations have been done for MAR as mentioned in the for-
mula provided before after which the value of MAR is also checked against
a threshold to determine whether a person is yawning.When the mouth as-
pect ratio is significantly high it is clear that the mouth is wide open most
probably for yawning.
0 0 1
Camera Matrix has intrinsic parameters,which are the center points of
image, (cx,cy); scale factor(s), and focal length between pixels (fx,fy).
distCoeffs Input vector of distortion coefficients. If the vector is NULL/empty,
the zero distortion coefficients are assumed.
Figure 4.8: Eyes and face being detected and contours being drawn around
eyes.
Chapter 5
Results
This project tries to combine all major features associated with drowsiness
of a driver to achieve optimum result. The features being EAR, MAR, Blink
Rate and Head tilt.
All the four parameters are considered simultaneously to determine whether
a person is drowsy or not. Our trained haar cascade is better in identifying
variations in facial structure, complexion along with identifying multiple faces
in a single frame. Our haar cascade performs better in terms of computation
time required to detect faces in real time scenario.
EAR and MAR values were observed for a specific time period in order
Figure 5.1: Confusion matrix for the haar cascade tested on positive and
negative images
33
CHAPTER 5. RESULTS 34
From the graph (Figure 5.4 ) it can be observed that while a person is
yawning MAR increases from 0.35 to 0.5-0.9 approximately. The threshold
is observed as the lowest peak found while yawning.
CHAPTER 5. RESULTS 35
Figure 5.5: The image depicts face with landmarks marked and head pose
being detected as it is evident that ear is above threshold and mar is below
threshold hence no alarm(Normal Conditions)
Figure 5.6: Though the person is smiling the mar values increases but not
too much hence no alarm.
CHAPTER 5. RESULTS 37
Figure 5.7: EAR remains above threshold and MAR remains below threshold
while talking
Figure 5.8: When the EAR goes below threshold, that is, person has his
eyes closed for more than certain number of frames consecutively an alert is
issued.
CHAPTER 5. RESULTS 38
Figure 5.9: Drowsy alarm is sounded when EAR is below threshold and head
pose is wrong too.
Figure 5.12: Alarm sounded when person is drowsy, eyes closed and yawning
Chapter 6
6.1 Conclusion
The project intends to present a solution to alert the driver before a mishap
happens. Detecting the driver drowsiness, which is one of the major cause
of road accidents, will reduce deaths and injuries to a great extent. The
Simulated system used to detect EAR, Eye blink, yawning and head pose
estimation for drivers drowsiness/ attention using Viola Jones/haar cascade,
facial landmark method and solvePnP (head tilt estimation). During moni-
toring the system is able to detect when the eyes are closed and mouth open
simultaneously for too long and again and again in less period of time thus
giving a buzzer sound to alert the driver. The system alerts the driver if he
closes his eyes for long time which is giving information that the driver might
have slept. Also the driver is alerted when his head is tilted or turned away
using head pose estimation. The blinking of eye has been detected indepen-
dent of haar classifiers are used for the eyes and Active contour method for
yawning.
40
CHAPTER 6. CONCLUSION AND FUTURE WORK 41
[2] Ghassan Jasim AL-Anizy, Md. Jan Nordin and Mohammed M. Ra-
zooq,”AUTOMATIC DRIVER DROWSINESS DETECTION USING
HAAR ALGORITHM AND SUPPORT VECTOR MACHINES TECH-
NIQUES”
42
REFERENCES 43
[14] Vinay K Diddi1 , Prof S.B.Jamge, ”HEAD POSE AND EYE STATE
MONITORING (HEM) FOR DRIVER DROWSINESS DETECTION:
OVERVIEW”, Vol 1., Issue 9, Nov 2014