Phase 5
Phase 5
Phase 5
Emotion Recognition in Image Captioning Emotion recognition adds a new dimension to image
captioning by detecting and expressing the emotions depicted in the image. With the help of
advanced algorithms, images can be analyzed to identify emotions such as joy, sadness,
surprise, and more.
Project Update:
We have created a website to upload and display Images
Future Development of the Project:
The website will be AI generated to caption the emotions in the uploaded image
Source Code:
HTML Code
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Image Upload</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="container">
<h1>Image Upload</h1>
<form id="upload-form" enctype="multipart/form-data">
<input type="file" name="file" id="file-input" accept="image/*">
<button type="submit">Upload</button>
</form>
<div id="preview">
<img id="image-preview" src="" alt="Image Preview">
</div>
</div>
<script src="script.js"></script>
</body>
</html>
CSS Code
body {
font-family: Arial, sans-serif;
}
.container {
max-width: 400px;
margin: 0 auto;
text-align: center;
}
h1 {
margin-top: 20px;
}
form {
margin: 20px 0;
}
input[type="file"] {
display: none;
}
button {
padding: 10px 20px;
background-color: #007BFF;
color: #fff;
border: none;
cursor: pointer;
}
button:hover {
background-color: #0056b3;
}
#preview {
display: none;
margin-top: 20px;
}
#image-preview {
max-width: 100%;
}
Javascript Code
const fileInput = document.getElementById('file-input');
const imagePreview = document.getElementById('image-preview');
fileInput.addEventListener('change', function () {
const file = fileInput.files[0];
if (file) {
const reader = new FileReader();
reader.onload = function (e) {
imagePreview.src = e.target.result;
document.getElementById('preview').style.display = 'block';
};
reader.readAsDataURL(file);
}
});
// Use formData to upload the file to the server or perform any necessary processing
// Example: Send the formData to the server using a fetch request or an XMLHttpRequest
});
fileInput.addEventListener('change', function () {
const file = fileInput.files[0];
if (file) {
imagePreview.src = e.target.result;
document.getElementById('preview').style.display = 'block';
};
reader.readAsDataURL(file);
});
e.preventDefault();
// Use formData to upload the file to the server or perform any necessary processing
// Example: Send the formData to the server using a fetch request or an XMLHttpRequest
});
Step 2: Setting up Data for the AI Model
Before running the scripts for the image recognition project, it's crucial to set up the required
datasets. Collecting the dataset for emotion captioning includes several steps,
3.Data collection
4.Emotion labeling
The set of distinct emotional states or labels used to categorize and describe human emotions
based on their characteristics and expressions.
SOURCE:
https://www.kaggle.com/datasets
DESCRIPTION: Kaggle Datasets serves as a valuable resource for data enthusiasts, offering
detailed descriptions, documentation, and metadata to assist in dataset selection .
https://github.com/microsoft/FERPlus
COCO Dataset
DESCRIPTION: The COCO Dataset is a widely recognized and extensively used dataset in the
field of computer vision and machine learning. It is designed for various computer vision tasks,
including object detection, image segmentation, and captioning.
3.Data collection:
To run the image recognition scripts successfully, we should follow these steps,
Download the CK+, FER-13, and FERPlus datasets from their respective sources.
Organizing the dataset files according to our project's directory structure.
The collection of data could be facial images, text, audio recordings, or any other
relevant data.
REORGANIZATION OF DATA:
Images are transferred from the original FERPlus directory structure to match the FER-
2013 structure.
Images are categorized into training and test sets based on the "Usage" attribute in the
CSV file.
EXECUTION:
The Python script appears to be related to data preprocessing and reorganizing image data
from two different datasets: the "FERPlus" dataset and the "FER-2013" dataset, which are
used in the context of facial expression recognition.
import os
import shutil
import cv2
import numpy as np
import pandas as pd
best_emotion = np.argmax(emotions)
emotions[best_emotion] = 0
best_emotion = np.argmax(emotions)
return list_of_emotions[best_emotion]
def read_and_clean_csv(path):
# we read the csv and we delete all the rows which contains NaN
df = pd.read_csv(path)
df = df.dropna()
return df
def rewrite_image_from_df(df):
acc = ""
emotions = [
"neutral",
"happy",
"surprise",
"sad",
"angry",
"disgust",
"fear",
"contempt",
"unknown",
"NF",
]
# we rewrite all the image files
item = df.iloc[row]
print(f"{item['Usage']} done")
if (item['Usage'] == "Training"):
else:
acc = item["Usage"]
if acc == "Training":
cv2.imwrite(
image,
else:
cv2.imwrite(
image,
if __name__ == "__main__":
df = read_and_clean_csv("./FERPlus/fer2013new.csv")
rewrite_image_from_df(df)
4.Emotion labeling:
<opencv_storage>
<cascade type_id="opencv-cascade-classifier"><stageType>BOOST</stageType>
<featureType>HAAR</featureType>
<height>24</height>
<width>24</width>
<stageParams>
<maxWeakCount>211</maxWeakCount></stageParams>
<featureParams>
<maxCatCount>0</maxCatCount></featureParams>
<stageNum>25</stageNum>
<stages>
<_>
<maxWeakCount>9</maxWeakCount>
<stageThreshold>-5.0425500869750977e+00</stageThreshold>
<weakClassifiers>
<_>
<internalNodes>
0 -1 0 -3.1511999666690826e-02</internalNodes>
<leafValues>
2.0875380039215088e+00 -2.2172100543975830e+00</leafValues></_>
<_>
<internalNodes>
0 -1 1 1.2396000325679779e-02</internalNodes>
<leafValues>
-1.8633940219879150e+00 1.3272049427032471e+00</leafValues></_>
<_>
<internalNodes>
0 -1 2 2.1927999332547188e-02</internalNodes>
<leafValues>
-1.5105249881744385e+00 1.0625729560852051e+00</leafValues></_>
<_>
<internalNodes>
0 -1 3 5.7529998011887074e-03</internalNodes>
<leafValues>
-8.7463897466659546e-01 1.1760339736938477e+00</leafValues></_>
<_>
<internalNodes>
0 -1 4 1.5014000236988068e-02</internalNodes>
<leafValues>
-7.7945697307586670e-01 1.2608419656753540e+00</leafValues></_>
<_>
<internalNodes>
0 -1 5 9.9371001124382019e-02</internalNodes>
<leafValues>
5.5751299858093262e-01 -1.8743000030517578e+00</leafValues></_>
<_>
<internalNodes>
0 -1 6 2.7340000960975885e-03</internalNodes>
<leafValues>
-1.6911929845809937e+00 4.4009700417518616e-01</leafValues></_>
<_>
<internalNodes>
0 -1 7 -1.8859000876545906e-02</internalNodes>
<leafValues>
-1.4769539833068848e+00 4.4350099563598633e-01</leafValues></_>
<_>
<internalNodes>
0 -1 8 5.9739998541772366e-03</internalNodes>
<leafValues>
-8.5909199714660645e-01 8.5255599021911621e-01</leafValues></_></weakClassifiers></_>
<_>
<maxWeakCount>16</maxWeakCount>
<stageThreshold>-4.9842400550842285e+00</stageThreshold>
<weakClassifiers>
<_>
<internalNodes>
0 -1 9 -2.1110000088810921e-02</internalNodes>
<leafValues>
1.2435649633407593e+00 -1.5713009834289551e+00</leafValues></_>
<_>
<internalNodes>
0 -1 10 2.0355999469757080e-02</internalNodes>
<leafValues>
-1.6204780340194702e+00 1.1817760467529297e+00</leafValues></_>
<_>
<internalNodes>
0 -1 11 2.1308999508619308e-02</internalNodes>
<leafValues>
-1.9415930509567261e+00 7.0069098472595215e-01</leafValues></_>
<_>
<internalNodes>
0 -1 12 9.1660000383853912e-02</internalNodes>
<leafValues>
-5.5670100450515747e-01 1.7284419536590576e+00</leafValues></_>
<_>
<internalNodes>
0 -1 13 3.6288000643253326e-02</internalNodes>
<leafValues>
2.6763799786567688e-01 -2.1831810474395752e+00</leafValues></_>
<_>
<internalNodes>
0 -1 14 -1.9109999760985374e-02</internalNodes>
<leafValues>
-2.6730210781097412e+00 4.5670801401138306e-01</leafValues></_>
<_>
<internalNodes>
0 -1 15 8.2539999857544899e-03</internalNodes>
<leafValues>
-1.0852910280227661e+00 5.3564202785491943e-01</leafValues></_>
<_>
<internalNodes>
0 -1 16 1.8355000764131546e-02</internalNodes>
<leafValues>
-3.5200199484825134e-01 9.3339198827743530e-01</leafValues></_>
<_>
<internalNodes>
0 -1 17 -7.0569999516010284e-03</internalNodes>
<leafValues>
9.2782098054885864e-01 -6.6349899768829346e-01</leafValues></_>
<_>
<internalNodes>
0 -1 18 -9.8770000040531158e-03</internalNodes>
<leafValues>
1.1577470302581787e+00 -2.9774799942970276e-01</leafValues></_>
<_>
<internalNodes>
0 -1 19 1.5814000740647316e-02</internalNodes>
<leafValues>
-4.1960600018501282e-01 1.3576040267944336e+00</leafValues></_>
<_>
<internalNodes>
0 -1 20 -2.0700000226497650e-02</internalNodes>
<leafValues>
1.4590020179748535e+00 -1.9739399850368500e-01</leafValues></_>
<_>
<internalNodes>
0 -1 21 -1.3760800659656525e-01</internalNodes>
<leafValues>
1.1186759471893311e+00 -5.2915501594543457e-01</leafValues></_>
<_>
<internalNodes>
0 -1 22 1.4318999834358692e-02</internalNodes>
<leafValues>
-3.5127198696136475e-01 1.1440860033035278e+00</leafValues></_>
<_>
<internalNodes>
0 -1 23 1.0253000073134899e-02</internalNodes>
<leafValues>
-6.0850602388381958e-01 7.7098500728607178e-01</leafValues></_>
<_>
<internalNodes>
0 -1 24 9.1508001089096069e-02</internalNodes>
<leafValues>
3.8817799091339111e-01 -1.5122940540313721e+00</leafValues></_></weakClassifiers></_>
<_>
<maxWeakCount>27</maxWeakCount>
<stageThreshold>-4.6551899909973145e+00</stageThreshold>
<weakClassifiers>
<_>
<internalNodes>
0 -1 25 6.9747000932693481e-02</internalNodes>
<leafValues>
-1.0130879878997803e+00 1.4687349796295166e+00</leafValues></_>
<_>
<internalNodes>
0 -1 26 3.1502999365329742e-02</internalNodes>
<leafValues>
-1.6463639736175537e+00 1.0000629425048828e+00</leafValues></_>
<_>
<internalNodes>
0 -1 27 1.4260999858379364e-02</internalNodes>
<leafValues>
4.6480301022529602e-01 -1.5959889888763428e+00</leafValues></_>
<_>
<internalNodes>
0 -1 28 1.4453000389039516e-02</internalNodes>
<leafValues>
-6.5511900186538696e-01 8.3021801710128784e-01</leafValues></_>
<_>
<internalNodes>
0 -1 29 -3.0509999487549067e-03</internalNodes>
<leafValues>
-1.3982310295104980e+00 4.2550599575042725e-01</leafValues></_>
<_>
<internalNodes>
0 -1 30 3.2722998410463333e-02</internalNodes>
<leafValues>
-5.0702601671218872e-01 1.0526109933853149e+00</leafValues></_>
<_>
<internalNodes>
0 -1 31 -7.2960001416504383e-03</internalNodes>
<leafValues>
3.6356899142265320e-01 -1.3464889526367188e+00</leafValues></_>
<_>
<internalNodes>
0 -1 32 5.0425000488758087e-02</internalNodes>
<leafValues>
-3.0461400747299194e-01 1.4504129886627197e+00</leafValues></_>
<_>
<internalNodes>
0 -1 33 4.6879000961780548e-02</internalNodes>
<leafValues>
-4.0286201238632202e-01 1.2145609855651855e+00</leafValues></_>
<_>
<internalNodes>
0 -1 34 -6.9358997046947479e-02</internalNodes>
<leafValues>
1.0539360046386719e+00 -4.5719701051712036e-01</leafValues></_>
<_>
<internalNodes>
0 -1 35 -4.9033999443054199e-02</internalNodes>
<leafValues>
-1.6253089904785156e+00 1.5378999710083008e-01</leafValues></_>
<_>
<internalNodes>
0 -1 36 8.4827996790409088e-02</internalNodes>
<leafValues>
2.8402999043464661e-01 -1.5662059783935547e+00</leafValues></_>
<_>
<rects>
<_>
0 13 18 3 -1.</_>
<_>
0 14 18 1 3.</_></rects></_>
<_>
<rects>
<_>
15 17 9 6 -1.</_>
<_>
15 19 9 2 3.</_></rects></_>
<_>
<rects>
<_>
0 17 9 6 -1.</_>
<_>
0 19 9 2 3.</_></rects></_>
<_>
<rects>
<_>
12 17 9 6 -1.</_>
<_>
12 19 9 2 3.</_></rects></_>
<_>
<rects>
<_>
3 17 9 6 -1.</_>
<_>
3 19 9 2 3.</_></rects></_>
<_>
<rects>
<_>
16 2 3 20 -1.</_>
<_>
17 2 1 20 3.</_></rects></_>
<_>
<rects>
<_>
0 13 24 8 -1.</_>
<_>
0 17 24 4 2.</_></rects></_>
<_>
<rects>
<_>
9 1 6 22 -1.</_>
<_>
12 1 3 11 2.</_>
<_>
9 12 3 11 2.</_></rects></_></features></cascade>
</opencv_storage>
EXECUTION
The provided XML code represents a Haar Cascade Classifier, a machine learning model used
for detecting frontal faces in images. Haar Cascade Classifiers are a type of object detection
algorithm commonly used in computer vision for recognizing objects, such as faces, within
images.
DATA PREPROCESSING:
The role of data preprocessing in training emotion recognition models. It highlights the use
of techniques like data augmentation, specifically leveraging Keras' ImageDataGenerator. Data
augmentation is employed to improve the model's capacity to identify emotions from diverse
facial expressions.
ARCHITECTURE SELECTION:
The readers introduce range of pre-trained architectures, such as VGG16, ResNet50, Xception,
and Inception, which form the basis for emotion recognition models.
The importance of fine-tuning in model development, emphasizing its role in adapting the
model for a specific task. It discusses the process of selecting and configuring the layers that
require retraining to achieve optimal model performance.
Developers are guided on saving their trained models for future use, enabling them to deploy
these models in various applications with consistent performance.
image_gen = ImageDataGenerator(
# rescale=1 / 127.5,
rotation_range=20,
zoom_range=0.05,
shear_range=10,
horizontal_flip=True,
fill_mode="nearest",
validation_split=0.20,
preprocessing_function=preprocess_input,
# create generators
train_generator = image_gen.flow_from_directory(
parameters["train_path"],
target_size=parameters["shape"],
shuffle=True,
batch_size=parameters["batch_size"],
test_generator = image_gen.flow_from_directory(
parameters["test_path"],
target_size=parameters["shape"],
shuffle=True,
batch_size=parameters["batch_size"],
return (
glob(f"{parameters['train_path']}/*/*.jp*g"),
glob(f"{parameters['test_path']}/*/*.jp*g"),
train_generator,
test_generator,
# fine tuning
layer.trainable = False
return model
model = architecture(
input_shape=parameters["shape"] + [3],
weights="imagenet",
include_top=False,
classes=parameters["nbr_classes"],
layer.trainable = False
out = model.output
x = Flatten()(out)
x = Dense(parameters["nbr_classes"], activation="softmax")(x)
opti = SGD(
lr=parameters["learning_rate"],
momentum=parameters["momentum"],
nesterov=parameters["nesterov"],
# model.summary()
return model
def fit(model, train_generator, test_generator, train_files, test_files, parameters):
return model.fit(
train_generator,
validation_data=test_generator,
epochs=parameters["epochs"],
steps_per_epoch=len(train_files) // parameters["batch_size"],
validation_steps=len(test_files) // parameters["batch_size"],
callbacks=[early_stop],
score = model.evaluate_generator(test_generator)
return score
save_model(model=model, filepath=f"./trained_models/{filename}")
model.save_weights(f"./trained_models/{filename}.h5")
Conclusion:
In the fifth phase of our image recognition development, we focused on aspects of data
preparation, Haar Cascade Classifier for face detection, and the creation and fine-tuning of
deep learning models for emotion recognition. We recognized the foundational importance of
well-organized and preprocessed datasets in training accurate image recognition models. The
exploration of the Haar Cascade Classifier highlighted its efficiency in detecting faces in
images, making it a valuable tool for a wide range of computer vision applications. It also
involved the development of powerful deep learning models, utilizing pre-trained
architectures like VGG16, ResNet50, Xception, and Inception, and adapting them for emotion
recognition. These models are poised to deliver impressive levels of accuracy.