Shwaga Abay Final Submitted
Shwaga Abay Final Submitted
Shwaga Abay Final Submitted
2021-06
SHWAGA, ABAY
http://ir.bdu.edu.et/handle/123456789/12657
Downloaded from DSpace Repository, DSpace Institution's institutional repository
BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
BY
SHWAGA ABAY
JUNE, 2021
BAHIR DAR, ETHIOPIA
Mango Disease Detection Using Machine Learning Technique
By
Shwaga Abay Ashebr
A Thesis Submitted to the School of Research and Graduate Studies of Bahir Dar
Institute of Technology, BDU in Partial Fulfilment for the Degree of Master of
Science in Software Engineering in the Faculty of Computing
June, 2021
Bahir Dar, Ethiopia
i
ii
©2021
SHWAGA ABAY ASHEBR
Mango Disease Detection Using Machine Learning Technique
ALL RIGHT RESERVED
iii
iv
ACKNOWLEDGEMENT
First and foremost, praise and thanks to the Almighty God, for His blessing, support, and
protection to successfully complete the study during my research work. I would also like
to thank the Ever-Virgin, St. Marry, our Lord's mother. She is my strength ever when I
get tiered going through a hard way and helps me complete this study.
I would like to express my gratitude to my Advisor Mr. Seffi Gebeyehu (Ass. Professor)
for his valuable guidance throughout this research. It was a great honor to work and study
under his supervision. I submit my heartiest gratitude to Mr. Abreham Debasu (Ass.
Professor) for the encouragement and constructive comments given. Without his
guidance and diligent help, this thesis would not have been conceivable.
My sincere thanks also goes to staff of Weramit fruit and vegetable center, Bahir-Dar,
who have helped me at a time of data collection and labelling. Special thanks goes to Mr.
Sisay and Mr. Adam for their kind support and sharing mango leaf pictures and providing
constructive ideas about the mango leaf disease and insect pest identification.
Last but not least, I also want to deliver my special thanks to all the people who along the
way believed in me.
v
ABSTRACT
Mango (Mangifera indica) is of great significant fruit crop which grows in different
agro-ecologies in the world. Mangoes are good sources of vitamins and minerals.
However, nowadays its productivity is very limited since it is attacked by different
diseases and pests. Thus to increase the mango fruit quality and productivity, it is crucial
and feasible to detect diseases and insect pests at the early stage. In this study, we have
designed and developed mango leaf disease identification mechanism using machine
learning (ML) technique. Healthy and diseased mango leaf images were captured
manually from main production areas in Amhara Region such that Weramit fruit and
vegetable research and training sub-centre, and Bahir Dar city for the identification
method. As an implementation tool, Python on an anaconda Spyder working
environment, and Google Collaboratory were used. To enhance the dataset different pre-
processing techniques (i.e. image resizing, histogram adjustment, noise removal, and
image augmentation) using the OpenCV library were applied. To enhance the
classification performance and to achieve the objective of this study different
segmentation techniques such as k means, Mask R-CNN, and combined were used.
Besides, after the pre-processing and segmentation steps, features of mango leaf images
were extracted using CNN to get the relevant features. Then the classification model was
built using CNN, SVM, and CNN-SVM classifiers on the extracted features of mango leaf
images. For these classification models three different activation functions such that
Tanh, Relu, and Leaky Relu were applied to achieve better classification accuracy. From
the experiment, we noticed that these classifiers using segmented images and Leaky Relu
activation function were achieved a significant classification performance with an
accuracy of CNN 97.62%, SVM 98.01%, and CNN-SVM 99.78% respectively.
vi
Table of Contents
DECLARATION ................................................................................... Error! Bookmark not defined.
ACKNOWLEDGEMENT ...................................................................................................................... v
ABSTRACT ........................................................................................................................................ vi
LIST OF FIGURES ............................................................................................................................... x
LIST OF TABLES .................................................................................................................................xi
ABBREVIATIONS ..............................................................................................................................xii
CHAPTER ONE .................................................................................................................................. 1
1. INTRODUCTION ........................................................................................................................ 1
1.1. Background of the Study.................................................................................................. 1
1.2. Statement of the Problem ............................................................................................... 3
1.3. Objectives of the Study .................................................................................................... 5
1.3.1. General Objective .................................................................................................... 5
1.3.2. Specific Objective ..................................................................................................... 5
1.4. Methodologies ................................................................................................................. 5
1.4.1. Literature Review..................................................................................................... 5
1.4.2. Data Collection and Datasets Preparation ................................................................ 5
1.4.3. Data Pre-processing and Segmentation.................................................................... 5
1.4.4. Feature Extraction and Modelling............................................................................ 6
1.4.5. Performance Evaluation Technique ......................................................................... 6
1.4.6. Implementation Tool ................................................................................................ 6
1.5. Scope and limitation of the Study .................................................................................... 7
1.6. Significance of the Study .................................................................................................. 7
1.6.1. Empirical Contribution ............................................................................................ 7
1.6.2. Methodological/Scientific Contribution .................................................................. 7
1.7. Organization of the Study ................................................................................................ 8
CHAPTER TWO ................................................................................................................................. 9
2. LITERATURE REVIEW ................................................................................................................ 9
2.1. Introduction ..................................................................................................................... 9
2.2. Major mango diseases and insect pests ........................................................................ 10
2.2.1. White mango scale insect (Aulacaspis tubercularis Newstead (Hemiptera:
Diaspididae)) .......................................................................................................................... 10
vii
2.2.2. Powdery Mildew (Oidium mangiferae) ................................................................. 11
2.2.3. Anthracnose (Colletotrichum gloeosporioides) ..................................................... 12
2.3. Plant disease and pest detection using image processing and deep learning .............. 13
2.3.1. Convolutional Neural Networks ............................................................................ 14
2.3.2. Support vector machine ......................................................................................... 19
2.4. Image pre-processing..................................................................................................... 22
2.4.1. Image Resizing....................................................................................................... 22
2.4.2. Histogram Equalization.......................................................................................... 24
2.4.3. Image noise types and de-noising techniques ........................................................ 27
2.4.4. Image Augmentation .............................................................................................. 30
2.5. Segmentation ................................................................................................................. 30
2.6. Feature Extraction.......................................................................................................... 36
2.7. Related Work ................................................................................................................. 37
2.8. Summary ........................................................................................................................ 38
CHAPTER THREE ............................................................................................................................. 40
3. METHODOLOGY ..................................................................................................................... 40
3.1. Introduction ................................................................................................................... 40
3.2. Architectural Framework of the proposed model ......................................................... 41
3.3. Image Acquisition........................................................................................................... 42
3.4. Image Pre-processing..................................................................................................... 43
3.4.1. Image Resizing....................................................................................................... 43
3.4.2. Histogram Equalization.......................................................................................... 44
3.4.3. Image de-nosing ..................................................................................................... 46
3.4.4. Data Augmentation ................................................................................................ 47
3.5. Image Segmentation ...................................................................................................... 48
3.5.1. K-means Segmentation .......................................................................................... 48
3.5.2. Mask R-CNN ......................................................................................................... 49
3.6. Classification .................................................................................................................. 50
3.6.1. Building the Model ................................................................................................ 50
3.6.2. Compiling the Model ............................................................................................. 51
3.6.3. Training and Testing the Model ............................................................................. 51
CHAPTER FOUR .............................................................................................................................. 53
viii
4. EXPERINENT, RESULT AND DISCUSSION ................................................................................ 53
4.1. Introduction ................................................................................................................... 53
4.2. Experimental Setup ........................................................................................................ 53
4.2.1. Dataset ........................................................................................................................... 53
4.2.2. Implementation ............................................................................................................. 54
4.3. Result and Discussion..................................................................................................... 54
CHAPTER FIVE ................................................................................................................................ 58
5. CONCLUSION AND RECOMMENDATION ............................................................................... 58
5.1. Conclusion ...................................................................................................................... 58
5.2. Recommendation........................................................................................................... 59
References ..................................................................................................................................... 60
ix
LIST OF FIGURES
Figure2. 1 sample mango images........................................................................................ 9
Figure 2. 2: white mango scale ......................................................................................... 11
Figure 2. 3 Mango powdery mildew ................................................................................. 12
Figure 2. 4 Mango anthracnose ......................................................................................... 13
Figure 2. 5 An overview of convolutional neural network (CNN) architecture and the
training process. ................................................................................................................ 15
Figure 2. 6 Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved
feature. .............................................................................................................................. 16
Figure 2. 7 Relu operation ................................................................................................ 17
Figure 2. 8 max-pooling and average-pooling operations ................................................ 18
Figure 2. 9 SVM hyperplanes ........................................................................................... 19
Figure 2. 10 SVM optimal hyperplane ............................................................................. 20
Figure 2. 11 image resizing operation using nearest neibor interpolation ........................ 23
Figure 2. 12 image resizing operation using biliear interpolation .................................... 23
Figure 2. 13 Histogram and its Equalized Histogram for H.E. ......................................... 25
Figure 2. 14 BI-histogram Equalization Method .............................................................. 26
Figure 2. 15 Image segmentation types ............................................................................ 31
Figure 2. 16 Mask R-CNN ................................................................................................ 34
x
Figure 4. 1: Training and validation accuracy of CNN-SVM on raw mango leaf images 55
Figure4. 2: Training and validation loss of CNN-SVM on raw mango leaf images ........ 55
Figure 4. 3: validation accuracy of CNN-SVM on preprocessed mango leaf images ...... 56
Figure 4. 4 validation loss of CNN-SVM classifier on preprocessed dataset ................... 56
LIST OF TABLES
Table 2. 1: Summary of related works .............................................................................. 39
Table 4. 1: Dataset description ……………………………………………………………………………………………...53
Table 4. 2: CNN classification analysis with raw mango leaf images.............................. 54
Table 4. 3: Classification analysis of CNN model with different Activation functions on a
preprocessed image dataset ............................................................................................... 57
Table 4. 4: Classification analysis of SVM + RBF model with different Activation
functions on a preprocessed image dataset ....................................................................... 57
Table 4. 5: Classification analysis of Combined ( CNN + SVM )model with different
Activation functions on a preprocessed image dataset ..................................................... 57
xi
ABBREVIATIONS
ANN Artificial Neural Network
xii
CHAPTER ONE
1. INTRODUCTION
1.1. Background of the Study
According to (Tewodros Bezu, Kebede Woldetsadik , & Tamado Tana, 2014), fruit crops
play an important role in the national food security of people. They are generally
delicious and highly nutritious, mainly of vitamins and minerals that can balance cereal-
based diets. Fruits supply raw materials for local industries and could be sources of
foreign currency. Moreover, the development of the fruit industry will create employment
opportunities, particularly for farming communities. In general, Ethiopia has great
potential and encouraging policy to expand fruit production for fresh market and
processing both for domestic and export markets. Besides, fruit crops are friendly to
nature, sustain the environment, provide shade, and can easily be incorporated into any
agroforestry programs.
In the study conducted by (Jurgen, 2003), they stated that mango, because of its attractive
appearance and the very pleasant taste of selected cultivars, is claimed to be the most
important fruit of the tropics and has been touted as 'king of all fruits’. This fruit contains
almost all the known vitamins and many essential minerals. The protein content is
generally a little higher than that of other fruits except for the avocado. Mangos are also a
fairly good source of thiamine and niacin and contain some calcium and iron.
1
(4,936,022.34 Quintals). In 2017, the fresh mango production level was estimated to be
1049.82 tons (CSA, 2017/18). Deke and Wonjeta Kebeles Bahirdar Zuria wereda in the
Amhara region produce about 8, 443,400 tons of Alponso mango variety every year.
However, mango production is severely affected in recent years by insect pests such as
white mango scale and mango diseases (FAO, 2019).
Besides in Ethiopia, the major reasons for the low production are due to damage by local
and invasive pests, diseases like Powdery mildew and Anthracnose leading to huge
production losses. The white mango scale is one of the major invasive scale insects and
causes serious damage to mango plants. It is a serious pest that injures mangoes by
feeding on the plant sap through leaves, branches, and fruits, causing defoliation, drying
up of young twigs, poor blossoming and so affecting the commercial value of fruits and
their export potential especially to late cultivars where it causes conspicuous pink
blemishes around the feeding sites of the scales (Anjulo, 2019).
Along with the technologies invented in the past few decades, image processing and
machine learning have gained important applications in the areas of agriculture.
Therefore, the implementation of those technologies in such areas will have paramount
importance to identify and control the prevalence of mango diseases and insect pests.
2
1.2. Statement of the Problem
Agriculture production is something that the economy relies on heavily. This is one of the
reasons why disease identification in plants plays an important role in the field of
agriculture since it is very common to have a disease in plants. If proper care is not taken
in this area, then serious effects on plants are caused and the resulting quality, quantity, or
productivity of the product are affected (Dessalegn, Assefa, Derso, & Tefera, 2014).
However, Mango production is highly decreasing in recent years because of different
diseases and insect pests. According to (Tewodros Bezu, Kebede Woldetsadik , &
Tamado Tana, 2014) the major mango production constraints are water shortage or
erratic rainfall followed by insect pest problems. Powdery mildew and anthracnose are
also the major diseases that lead to the decline of mango fruit production. Lack of
knowledge and recommended production practices like nutrition, pruning, pest
management, and post-harvest losses are also noted as major problems of the cultivators.
Another study conducted by (Dessalegn, Assefa, Derso, & Tefera, 2014) shows that the
adoption of improved mango production practices by farmers largely depends on the
availability of knowledgeable extension workers in the area.
In the findings of (Ullagaddi & Raju, 2017), modified rotation kernel transformation
(MRKT) based directional feature extraction has been applied in the identification of
black spots on mango fruit and leaf, and for recognition, ANN had been used. Though
they applied MRKT and ANN for the identification of black spots, the MRKT feature
vector is suitable for only one class. However, in the proposed system CNN were used
for both extracting relevant features instead of handcrafted features for more than one
3
class. Moreover, hybrids of SVM and CNN techniques were considered for classification
to enhance the computational cost and identification performance of the model.
In the study conducted by (Arivazhagan & Ligi, 2018) a convolutional neural network
was applied for the Identification of mango leaf diseases. They used a dataset of 1200
images of diseased and healthy mango leaves of which 600 for training and 600 for
testing. And in another study conducted by (rdjan Sladojevic, Marko Arsenovic, Andras
Anderla, Dubravko Culibrk, & Darko Stefanovic, 2016), CNN has been tested for 13
different types of plant diseases. In spites, in both studies, they have used CNN as both
classification and feature extraction for plant diseases and achieved better results, here
the main gap and the reason to get a better result is they have used augmented images in
order to maximize their number of datasets; however CNN takes more training time for
classification as the dataset increases. To overcome this problem, we have considered
different pre-processing and segmentation techniques to minimize the time that CNN
takes to extract features from raw images. Besides in this study, the hybrid of different
machine learning techniques with respective learning and kernel functions are used to
enhance the identification performance of the model. To this end, the following research
questions are answered by this thesis work:
4
1.3. Objectives of the Study
In the segmentation step, two segmentation techniques were used such as k-means
segmentation and Mask R-CNN segmentation from the given pre-processed mango leaf
images to extract the relevant leaf lesions as a region of interest to simplify the feature
extraction process.
6
1.5. Scope and limitation of the Study
Currently, plant Disease detection and identification is a wider research area. However,
the main focus of this study was designing and developing an enhanced imaging and
machine learning technique to identify mango leaf diseases and insect pests. In this study,
only mango leaf images were used as a dataset and they have been collected from
Amhara region mango production areas, in particular, Bahir Dar city, Weramit fruit, and
vegetable research and training sub-center. For this study, only White mango scale (insect
pest), Anthracnose, and Powdery Mildew were considered to be identified. Those
diseases and insect pests were selected purposively based on the severity of mango
production loss and the availability of their image data.
7
3. We have proposed a combination of K-Means & Mask R-CNN for segmenting
mango leaf lesions. It helps to segment the region of interest effectively that
increase the performance of our model and minimize training time.
4. Improved mango leaf disease detection performance by studying the effect of
different activation function and by adding different CNN parameters such as
convolution, pooling, batch normalization, and dropout layers to our proposed
model.
1.7. Organization of the Study
The rest of the thesis is organized as follows:
Chapter 2 describes the literature review part which is aimed at evaluating prior
findings into the details of mango leaf disease, image classification, including image
processing viewpoints, hand feature extraction, and machine learning. There is also a run-
down of comparable studies using the features extracted by CNN to feed the classifiers of
machine learning such as SVM, which has been found as a convincing basis for this
research to be conducted.
Chapter 3 in this section the methodology for the proposed work performed in this study
is presented. This includes the approach taken to conduct the research, software libraries,
and tools used in the architecture of the proposed model as well as the methodology flow.
The training procedure is discussed here with the proposed classification model.
Chapter 4 covers the result and discussion which describes the experimental set up to
train the classification model, and which offers a breakdown of the experimental results,
the model evaluation, and the discussion of the results in light of the research question.
Besides, comparative result analysis (which includes the performance evaluation) of each
proposed classification model in this thesis is included in this section.
Chapter 5 concludes the work accomplished in this paper and includes recommendation
for the work to be done in the future.
8
CHAPTER TWO
2. LITERATURE REVIEW
2.1. Introduction
Mango (Mangifera indica) is one of the delicious and most important fruit crops
cultivated in Ethiopian agriculture. It is exported to many countries in the form of raw or
ripe fruits and also in the form of processed consumables like ripe mango slices or juice,
and raw mango pickle. Mango is rich in vitamin A and C, it has also rich medicinal
values in traditional medicine, and Mango leaves are mostly used during rituals since
these are having antibacterial activity against gram-positive bacteria. In recent times the
market value of Ethiopian mango is declined due to the uncontrolled use of pesticides;
hence it is the right time for the researchers to come up with ideas for early identification
of diseases and insect pests to control the use of dangerous pesticides which causes a
threat to human health. Figure 2.1 below shows that the sample images of the pure and
healthy mango fruits.
9
results in a severe outbreak of the disease and pest growth which cannot be controlled by
organic means. In this situation, farmers are forced to use poisonous chemicals to
eradicate the disease to retain the crop yield. This problem can be solved by automating
the monitoring process through the use of advanced image processing and deep learning
techniques (Sethupathy & S, 2016).
10
meanwhile unless some management strategies have been developed, mango farming will
be collapsed in the region (TF, 2018).
In recent days it is becoming the most important limiting factor for mango production in
northern west Ethiopia. Most of the smallholder growers were not aware of this invasive
pest.
Figure 2.2: White mango scale (insect pest) symptoms on mango leaf
2.2.2. Powdery Mildew (Oidium mangiferae)
Also called in amedaye: Amharic; hammokshtay in Tigrigna; dakuya in Oromifa.
Powdery mildew is another fungus that afflicts leaves, flowers, and young fruit. Infected
areas become covered with whitish powdery mildew. Figure 2.3 shows these symptoms
of whitish powdery mildew on the leaf surface of a mango plant. As leaves mature,
lesions along the midribs or underside of the foliage become dark brown and greasy
looking. In severe cases, the infection will destroy flowering panicles resulting in a lack
of fruit set and defoliation of the tree.
It is one of the most serious diseases of mango affecting almost all varieties. The
characteristic symptom of the disease is the white superficial powdery fungal growth on
leaves, a stalk of panicles, flowers, and young fruits. The affected flowers and fruits drop
pre-maturely reducing the crop load considerably or might even prevent the fruit set.
Rains or mists accompanied by cooler nights during flowering are congenial for the
disease spread(Ermias Teshome & Kassahun Sadessa, 2020).
11
Figure 2.3 Powdery mildew symptoms on mango leaf
2.2.3. Anthracnose (Colletotrichum gloeosporioides)
Mangos are most seriously affected by anthracnose (fungal disease). Symptoms of mango
disease manifest as black, sunken, irregularly shaped lesions in the case of anthracnose,
resulting in blossom blight, leaf blot, fruit stain, and eventual rot. Rainy temperatures and
strong dew are promoting the disease(Grant, 2020).
It was of major economic importance causing damage that can lead to the production of
unmarketable fruits. It is the major fungal disease limiting fruit production in all mango
growing countries, especially where there is high humidity during the cropping seasons.
Anthracnose disease is of widespread occurrence causing serious losses to young shoots
(shown in Figure 2.4 (a)), flowers, and fruits (shown in Figure 2.4 (b)) under favorable
climatic conditions of high humidity, frequent rains, and a temperature of 24–32 0C.
Anthracnose also affects fruits during storage. And it is of major economic importance
causing damage that can lead to the production of unmarketable fruits (WA, SS, & MK,
2016).
12
(a) (b)
Figure 2. 4 Anthracnose symptoms (a) on mango leaf, and (b) on mango fruit
2.3. Plant disease and pest detection using image processing and
deep learning
Throughout the history of agricultural development, plant diseases and pests have always
been one of the main obstacles hindering the development of the agricultural economy.
Plant disease identification through the human eye is based on the visible symptoms on
the leaves. Many studies also confirmed that relying on pure naked-eye observation of
experts to detect and classify such diseases can be prohibitively expensive, especially in
developing countries.
The plant disease identification system based on digital image processing technology had
the characteristics of fast, accurate, and real-time, which can help the farmers to take
effective preventive measures in time. Thus, providing fast, automatic, cheap, and
accurate image-processing-based solutions for that task can be of great realistic
significance. As an important technical means in the field of image recognition, image
processing, and deep learning have broad application prospects.
If we use image processing combined with the Convolution Neural Network model can
able to identify the disease in the early stage, so that we can be able to prevent the disease
and pests from spreading into other parts of the tree (K, Sivakami, & M.Janani, 2019).
13
2.3.1. Convolutional Neural Networks
Convolutional neural networks, is a very unusual combination of biology and math with
a little CS sprinkled in, but these networks have been some of the most influential
innovations in the field of computer vision. 2012 was the first year that neural nets grew
to prominence as Alex Krizhevsky used them to win that year’s ImageNet competition
(basically, the annual Olympics of computer vision), dropping the classification error
record from 26% to 15%, an astounding improvement at the time. Ever since then, a host
of companies have been using deep learning at the core of their services. Facebook uses
neural nets for their automatic tagging algorithms, Google for their photo search, Amazon
for their product recommendations, Pinterest for their home feed personalization, and
Instagram for their search infrastructure. However, the classic, and arguably most
popular, use case of these networks is for image processing. Within image processing,
let’s take a look at how to use these CNNs for image classification.
The convolutional neural network, also called ConvNet, is a specialized type of artificial
neural network that roughly mimics the human vision system. It was first introduced in
the 1980s by Yann LeCun, a postdoctoral computer science researcher. LeCun had built
on the work done by Kunihiko Fukushima, a Japanese scientist who, a few years earlier,
had invented the neocognitron, a very basic image recognition neural network (Dickson,
2020).
CNN based applications became prevalent after the exemplary performance of AlexNet
on the ImageNet dataset in 2012. Next to AlexNet, VGGNet is invented by the Visual
Geometry Group (by Oxford University). This architecture is the 1st runner-up of ILSVR
2014 in the classification task while the winner is GoogLeNet.
Ever since then, a host of companies have been using deep learning at the core of their
services. Facebook uses neural nets for their automatic tagging algorithms, Google for
their photo search, Amazon for their product recommendations, Pinterest for their home
feed personalization, and Instagram for their search infrastructure. However, the classic,
and arguably most popular, use case of these networks is for image processing. Within
image processing, let’s take a look at how to use these CNNs for image classification. In
recent years, CNN has become pivotal to many computer vision and deep learning
applications.
14
2.3.1.1. Basic Building Blocks of CNN Architecture
The CNN architecture includes several building blocks, such as convolution layers,
pooling layers, and fully connected layers. As described in Figure 2.5, a typical
architecture consists of repetitions of a stack of several convolution layers and a pooling
layer, followed by one or more fully connected layers. The step where input data are
transformed into output through these layers is called forward propagation. A model’s
performance under particular kernels and weights is calculated with a loss function
through forward-propagation on a training dataset, and learnable parameters, i.e., kernels
and weights are updated according to the loss value through backpropagation with
gradient descent optimization algorithm (Yamashita, Nishio, Do, & Togashi, 2018).
CNN
Forward
Input image
propagation
Convolution + Relu
Convolution + Relu
Convolution + Relu
Convolution + Relu
Max Pooling
Max Pooling
…
…
…
…
F F
… Output
C C
Loss
Label
Back propagation
Update
Figure 2.5 An overview of convolutional neural network (CNN) architecture and the
training process(Singh et al., 2019).
Convolution layer: A convolution layer is a fundamental component of the CNN
architecture that performs feature extraction, which typically consists of a combination of
linear and nonlinear operations, (i.e., convolution operation and activation function). The
objective of the convolution operation is to extract the high-level features such as edges,
from the input image.
15
Convolution is a specialized type of linear operation used for feature extraction that
involves the multiplication of a set of weights with the input, much like a traditional
neural network. Given that the technique was designed for two-dimensional input, the
multiplication is performed between an array of input data called a tensor (i.e. the green
color in Fig 2.6) and a two-dimensional array of weights, called a filter or a kernel (i.e.
the yellow color in Fig 2.6). The kernel is passed over the image, viewing a few elements
or pixels at a time (for example, 3X3 or 5X5).
An element-wise product between each element of the kernel and the input tensor is
calculated at each location of the tensor and summed to obtain the output value in the
corresponding position of the output tensor, called a feature map (i.e. the pink colour in
Fig 2.6). This procedure is repeated by applying multiple kernels to form an arbitrary
number of feature maps, which represent different characteristics of the input tensors;
different kernels can, thus, be considered as different feature extractors.
Two key hyper parameters that define the convolution operation are the size and number
of kernels. The distance between two successive kernel positions is called a stride, which
also defines the convolution operation. The common choice of a stride is 1.
Figure 2. 6 Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved
feature.
Nonlinear activation function: The outputs of a linear operation (i.e. convolution) is then
passed through a nonlinear activation function. Although smooth nonlinear functions,
such as sigmoid or hyperbolic tangent (Tanh) function, were used previously, the most
common nonlinear activation function used presently is the rectified linear unit (ReLU),
16
which simply computes the function: f(x) = max (0, x) as shown in Figure 2.7. The main
objective of ReLU is to introduce non-linearity in the ConvNet.
The other state of the art activation functions currently are Leaky Relu, and swish
activation functions. These activation functions eliminate the limitations of the ordinary
Relu activation.
Pooling layer: Similar to the Convolutional Layer, the Pooling layer is responsible for
reducing the spatial size of the Convolved Feature. This is to decrease the computational
power required to process the data through dimensionality reduction. Furthermore, it is
useful for extracting dominant features that are rotational and positional invariant, thus
maintaining the process of effectively training the model with a reduced number of
parameters and computation in the network. The pooling layer operates on each feature
map independently.
As shown in Figure 2.8 below, there are two types of Pooling: Max Pooling and Average
Pooling. Max Pooling returns the maximum value from the portion of the image covered
by the Kernel. On the other hand, Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel.
Max Pooling also performs as a Noise Suppressant. It discards the noisy activations
altogether and also performs de-noising along with dimensionality reduction. On the
other hand, Average Pooling simply performs dimensionality reduction as a noise
suppressing mechanism. Hence, we can say that Max Pooling performs a lot better than
Average Pooling (Saha, 2018).
17
Figure 2. 8 Max-pooling and Average-pooling operations
The Convolutional Layer and the Pooling Layer, together form the ith layer of a
After going through the above process, the model can successfully understand the
features. Moving on, the next phase is to flattening the final output and feeding it to a
Fully connected layer: A fully connected layer is the last layer of CNN that flattens the
features identified in the previous layers into a vector and predicts probabilities that the
image belongs to each one of several possible labels. The output feature maps of the final
convolution or pooling layer are typically flattened, i.e., transformed into a one-
dimensional (1D) array of numbers (or vector), and connected to one or more fully
connected layers, also known as dense layers, in which every input is connected to every
output by a learnable weight. Once the features extracted by the convolution layers and
down sampled by the pooling layers are created, they are mapped by a subset of fully
connected layers to the final outputs of the network, such as the probabilities for each
class in classification tasks. The final fully connected layer typically has the same number
of output nodes as the number of classes. Each fully connected layer is followed by a
Last layer activation function: The activation function applied to the last fully connected
layer is usually different from the others. An appropriate activation function needs to be
18
selected according to each task. An activation function applied to the multiclass
classification task is a Softmax function that normalizes output real values from the last
fully connected layer to target class probabilities, where each value ranges between 0 and
19
2.3.2.1. Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in Figure 2.10 below), then the hyperplane will be
a straight line. And if there are 3 features, then the hyperplane will be a 2-dimension
plane. SVM always creates a hyperplane that has a maximum margin, which means the
maximum distance between the data points or supports vectors.
Support Vectors: The data points or vectors that are the closest to the hyperplane and
which affect the position of the hyperplane are termed as Support Vector. Since these
vectors support the hyperplane, hence called a Support vector. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called the optimal hyperplane.
20
data into separable data. Thus, it transforms data into another dimension so that the data
can be linearly divided or classified by a plane.
Linear Kernel: The Linear kernel is the simplest kernel function that can be used as a
normal dot product for any two given observations. The product between two vectors is
the sum of the multiplication of each pair of input values.
𝑘(𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝑖 ⋅ 𝑥𝑗
Polynomial kernel: A polynomial kernel is a more generalized form of the linear kernel.
The polynomial kernel can distinguish curved or nonlinear input space. It is well suited
for problems where all the training data is normalized. Equation is:
𝑑
𝑘(𝑥𝑖 , 𝑥𝑗 ) = (𝑥𝑖 ⋅ 𝑥𝑗 + 1) , Where d is the degree of the polynomial. d=1 is similar to the
linear transformation. The degree needs to be manually specified in the learning
algorithm.
Gaussian radial basis function (RBF): The Radial basis function kernel is a popular
kernel function commonly used in support vector machine classification. RBF can map
an input space in infinite-dimensional space. It is a general-purpose kernel; used when
there is no prior knowledge about the data. The most popular RBF kernel Equation is:
2
𝑘(𝑥𝑖 − 𝑥𝑗 ) = exp (−𝛾‖𝑥𝑖 − 𝑥𝑗 ‖ )
, Here gamma is a parameter, which ranges from 0 to 1. A higher value of gamma will
perfectly fit the training dataset, which causes over-fitting. Gamma=0.1 is considered to
be a good default value. The value of gamma needs to be manually specified in the
learning algorithm.
21
2.4. Image pre-processing
Image pre-processing is a fundamental step in image processing and computer vision. It
is a method to perform some operations on an image, to get an enhanced image, or to
extract some useful information from it. It includes primitive operations to resizing,
contrast enhancement, reduce noise, image smoothing and sharpening, and advanced
operations such as image segmentation (Adatrao & Mittat, 2016).
Image processing involves the manipulation of images to extract information to
emphasize or de-emphasize certain aspects of the information, contained in the image or
perform image analysis to extract hidden information. The processing of an image
comprises an improvement in its appearance and effective representation of the input
image suitable for the required application. Pre-processing improves the quality of the
data by reducing artifacts. There are various types of techniques for image pre-
processing, and they are described below.
22
Figure 2. 11 Image resizing operation using nearest-neighbor interpolation
Bilinear Interpolation: Unlike other interpolation techniques such as nearest-neighbor
interpolation, bilinear interpolation uses only the 4 nearest pixel values which are located
in diagonal directions from a given pixel to find the appropriate color intensity values of
that pixel. It reflects the closest 2x2 neighborhood of known pixel values surrounding the
unknown pixel. It then takes a weighted average of these four pixels to arrive at its final
interpolated value. Bilinear interpolation (consider Figure 2.12 below) is usually used for
enlarging or zooming images and it is a default image resizing parameter. This algorithm
reduces some of the visual distortion caused by resizing an image. However, bilinear
interpolation appears to produce a greater number of interpolation artifacts such as
blurring, and edge halos. This algorithm takes more time and also more complex than the
nearest neighbor technique.
23
amplified image, and then the color value of P is determined by these 16 pixels according
to their distance from P. Since these neighboring pixels are at different distances from P,
closer pixels are given a greater weighting in the calculation. Bicubic produces notably
sharper images which are better results than the previous two techniques and is, therefore,
the ideal combination of processing time and output quality. Because of this, it is a
standard in many images in-camera interpolation.
24
A color histogram of an image represents the number of pixels in each type of color
component. Histogram equalization cannot be applied separately to the Red, Green, and
Blue components of the image as it leads to dramatic changes in the image’s color
balance. However, if the image is first converted to another color space, like HSL/HSV
color space, then the algorithm can be applied to the luminance or value channel without
resulting in changes to the hue and saturation of the image (Sudhakar, 2017). In general,
Histogram Equalization can be divided into several types:
Classical Histogram Equalization (CHE): CHE is the principal image processing
technique, particularly when images at the grey level are considered. It is a global
operation, in which Equalization is applied to the whole image. The purpose of this
technique is to uniformly spread the given number of grey levels over a range, thereby
improving its contrast. The CHE attempts to generate a flattened histogram output image,
which implies a uniform distribution. A picture is created by the dynamic range of grey
level values. The entire grey levels are effectively denoted as being 0 to L-1. The
downside to this technique is that it does not take into account an image's average
brightness. Due to the expansion of the grey levels over the full grey level range, the
CHE technique may result in over enhancement and saturation artifacts.
25
for homogeneous images because it has a behavior of over- intensifying noise in some
homogeneous regions of an image. It also fails to preserve the brightness of the input
image. Contrast limited adaptive histogram equalization (CLAHE) is an advanced form
of adaptive histogram
26
2.4.3. Image noise types and de-noising techniques
Noise is any unwanted information produced in the image during the image acquisition
process that results in pixel values that do not reflect the true intensities of the real scene.
The types of noise that contaminate the image can be classified into two groups including
linear noise and nonlinear noise. Gaussian noise is the well-known type of linear noise
whereas speckle noise, salt-and-pepper noise, and uniform impulse noise are nonlinear
noise. Gaussian noise can be characterized by its distribution with mean and variance
values. In a nonlinear group, salt-and-pepper noise corrupts the image’s pixels randomly
and sparsely which makes some pixel change to bright or dark. Uniform impulse noise is
considered by replacing a portion of some image pixel values with random values.
Speckle noise is characterized by signal-dependent noise where the noise corrupts the
image in the form of multiplicative noise (Langampol, Srisomboon, Patanavijit, & Lee,
2019).
1 (g − μ)2
𝐹(𝑔) = 𝑒−
√2𝜋𝜎 2 2𝜎 2
Where F(g) is the Gaussian distribution noise in an image, g represents the grey level
(Gaussian random variable), µ and σ is the mean and standard deviation respectively.
27
Salt and Pepper Noise: Salt and Pepper noise is affixed valued impulse noise which is
added to an image (8-bit image)by the addition of both random bright (with 255-pixel
value for salt noise) and random dark (with 0-pixel value for pepper noise) all over the
image. This model is also known as data drop noise because statistically, it drops the
original data values. It can be caused by dead pixels, analog-to-digital converter errors,
and transmitted bit errors (B & K, 2016).
𝑃1, 𝑥=𝐴
P(x) = { 𝑃2, 𝑥=𝐵
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where P1, P2 are the Probabilities Density Function (PDF); P(x) is the distribution of salt
and pepper noise in the image, and A, B are the array size image.
Poisson Noise/ quantum (photon) noise or shot noise: The appearance of this noise is
seen due to the statistical nature of electromagnetic waves such as x-rays, visible lights,
and gamma rays. The x-ray and gamma-ray sources emitted several photons per unit of
time. These rays are injected into the patient’s body from its source, in medical x rays and
gamma rays imaging systems. These sources are having random fluctuation of photons.
Result gathered image has spatial and temporal randomness. This type of noise has a
probability density function of a Poisson distribution. Source - random fluctuation of
photons.
Speckle Noise: A fundamental problem in optical and digital holography is the presence
of speckle noise in the image reconstruction process. Speckle noise is a crude noise that
corrupts the element of medical images in general. Speckle is a granular noise that
inherently exists in an image and degrades its quality. Speckle noise can be generated by
multiplying random pixel values with different pixels of an image. The presence of
Speckle noise in an image hinders the image interpretation (Dhruv, Mittal, & Modi, 2017)
and (S, V.P, & Rangaswamy, 2016). Speckle noise can be modeled with the pattern:
28
where g(x, y) is the observed image, n(x, y), and n1(x, y) is the multiplicative and additive
component of Speckle noise x, y denotes the axial and lateral indices of the image sample.
Considered only the multiplicative noise component and ignore the additive noise
component then the above equation becomes like the following:
Gaussian filter: Gaussian filter is a linear image de-noising technique that is usually used
to blur the image or to reduce noise. Gaussian filter smoothes the whole image
irrespective of its edges or details.
Median filter: Median filtering is a nonlinear method used to remove noise from images
while preserving edges. It is particularly effective at removing ‘salt and pepper’ type
noise. The median filter works by moving through the image pixel by pixel, replacing
each value with the median value of neighboring pixels. The pattern of neighbors is
called the "window", which slides, pixel by pixel over the entire image. The median is
calculated by first sorting all the pixel values from the window into numerical order, and
then replacing the pixel being considered with the middle (median) pixel value. Median
filter proves to preserve the edges and lines of an image in the best possible way thereby
removing the outliers. It can be stated as:
29
𝑦[ 𝑚, 𝑛 ] = 𝑚𝑒𝑑𝑖𝑎𝑛{ 𝑥 [ 𝑖, 𝑗 ], (𝑖, 𝑗)𝜖𝑤}
2.5. Segmentation
Image segmentation is a process of dividing the image into homogenous, self-consistent
regions corresponding to different objects in the image. It separates the image into
meaningful regions. An image can be segmented using the basic properties of features of
the image like intensity, edge, or texture.
Segmentation is a common procedure for feature extraction in images and volumes.
Segmenting an image means grouping its pixels according to their value similarity. The
simplified-color image can then be used to render important features independently from
one another.
The main aim of segmentation is to extract the ROI (Region of Interest) for image
analysis. Segmentation plays an important role in image processing since the separation
of a large image into several parts makes further processing simpler. The result of image
segmentation is a set of regions that collectively cover the entire image, where each pixel
in a region is similar to some characteristic or computed property, such as color,
intensity, or texture.
30
Plant leaf image segmentation plays an important role in plant disease detection through
leaf symptoms. It is to partition the image into essential regions for the appropriate
locations (Kaur A. , 2014).
There are different image segmentation techniques like threshold-based, edge-based,
region-based, cluster-based, and neural network-based. One of the most used clustering
algorithms is k-means clustering (Sethupathy & S, 2016).
Image segmentation can be classified into the following types:-
Edge Detection
Tresholding
Clustering
Fuzzy Logic
Neural Network
31
The advantage of the threshold method is that does not require prior information of the
image. It is also Fast and simple for implementation. In particular, when the target and
the background have high contrast, the segmentation effect can be obtained. And the
disadvantage of this technique is that it is difficult to obtain accurate results for image
segmentation problems where there is no significant grayscale difference or a large
overlap of the grayscale values in the image. Since it only takes into account the gray
information of the image without considering the spatial information of the image, it is
sensitive to noise and greyscale unevenness, leading it often combined with other
methods (Janwale, 2017).
Region-based /similarity-based segmentation: The regional growth method is a typical
serial region segmentation algorithm, and its basic idea is to have similar properties of the
pixels together to form a region. Pixels of the same type are identified and grouped into
the same type of regions. The method requires first selecting a seed pixel and then
merging the similar pixels around the seed pixel into the region where the seed pixel is
located. The Main types of Region-based Segmentation are Region Growing, Region
Splitting, and Region Merging.
The advantage of regional growth segmentation is that it usually separates the connected
regions with the same characteristics and provides good boundary information and
segmentation results. It is more immune to noise, useful when it is easy to define
similarity criteria, and works well for images having good contrast between regions. It is
also simple and requires only a few seed points to complete. The growth criteria in the
growing process can be freely specified and it can pick multiple criteria at the same time.
The disadvantage of this method is that it takes a high computational cost. And the noise
and greyscale unevenness can lead to voids and over-division. Also, the shadow effect on
the image is often not very good.
Edge detection segmentation: This technique is intended for locating boundaries of leaf
within the image based on the rapid change of intensity value in an image because a
single intensity value does not provide good information about edges. It segments the
image by identifying the difference between intensities at the border. In edge-based
segmentation methods, first of all, the edges are detected and then are connected to form
the object boundaries to segment the required regions (Kaur & Kaur, 2014). The basic
32
two edge-based segmentation methods are Gray histograms and Gradient-based methods.
To detect the edges one of the basic edge detection techniques like Sobel operator, canny
operator, and Robert‟s operator, etc can be used. The result of these methods is a binary
image. Sobel, Canny, Laplacian, and fuzzy logic are some of the techniques used for edge
detection. Image Edge detection significantly reduces the amount of data and filters out
useless information, while preserving the important structural properties in an image and
it is effective for images having better contrast between objects. Conversely, it is not
suitable for wrong detected or too many edges (Manjula.KA, 2015).
Clustering-based segmentation: Clustering methods attempt to group patterns that are
similar in some sense by dividing the population (data points) into several groups, such
that data points in the same groups are more similar to other data points in that same
group than those in other groups. These groups are known as clusters. The image is first
converted into a histogram and then clustering is performed on it.
One of the popular clustering algorithms is K-means clustering, which is an unsupervised
algorithm and is used to segment the interest area from the background. The algorithm is
used when there is unlabelled data (i.e., data without defined categories or groups). The
goal is to partition the given image into K-clusters or find certain groups based on some
kind of similarity in the image pixels with the number of groups represented by K by
minimizing the sum of squared distances between all points and the cluster center. It
clusters the same pixels to segment the image and helps to improve high performance and
efficiency(Jayapriya & Hemalatha, 2019). This algorithm only needs to know how many
clusters are in an image, or, in other words, how many clusters we want an image to have.
With this information, it can automatically find the best clusters. K-means clustering is
computationally faster for small values of k. it also eliminates noisy spots and it better to
obtain more homogeneous regions. (Manjula.KA, 2015).
𝑘
𝑛
(𝑗) 2
𝑓 = ∑ ∑ ‖𝑥𝑖 − 𝐶𝑗 ‖
𝑖=1
𝐽=1
33
Watershed: The watershed-based methods use the concept of topological interpretation.
The watershed methods consider the gradient of an image as a topographic surface. The
pixels having more gradients are represented as continuous boundaries. The main purpose
of watershed segmentation is that it has more stable results and detected boundaries are
continuous. However, it is expensive to calculate the gradients.
Neural network-based segmentation: The artificial neural network-based segmentation
techniques work by simulating the learning strategies of the human brain for decision
making. It is used to separate the required image from the background. A neural network
is made of a large number of connected nodes and each connection has a particular
weight. This method has two basic steps: extracting features and segmentation by a
neural network. The advantage of this segmentation is that it is simple and no need to
write complex programs. However, it takes much time for training (Kaur & Kaur, 2014).
One of the most popular NN-based image segmentation techniques is Mask R-CNN
based segmentation.
Mask R-CNN segmentation: Mask R-CNN is an instance segmentation technique that
locates each pixel of every object in the image instead of the bounding boxes. It has two
stages: region proposals and then classifying the proposals and generating bounding
boxes and masks. It does so by using an additional fully convolutional network on top of
a CNN-based feature map with input as a feature map and gives a matrix with 1 on all
locations where the pixel belongs to the object and 0 elsewhere as the output. The
following figure shows the framework of Mask R-CNN for instance segmentation taken
from.
34
It consists of a backbone network which is a standard CNN. The early layer of the
network detects low-level features, and later layers detect higher-level features. The
image is converted from 1024x1024px x 3 (RGB) to a feature map of shape 32x32x2048.
The Feature Pyramid Network (FPN) was an extension of the backbone network which
can better represent objects at multiple scales. It consists of two pyramids where the
second pyramid receives the high-level features from the first pyramid and passes them to
the lower layers. This allows every level to have access to both lower and higher-level
features.
It also uses the Region Proposal Network (RPN) which scans all FPN top to bottom and
proposes regions that may contain objects. It uses anchors which are a set of boxes with
predefined locations and scales itself according to the input images. Individual anchors
are assigned to the ground-truth classes and bounding boxes. RPN generates two outputs
for each anchor — anchor class and bounding box specifications. The anchor class is
either a foreground class or a background class.
Another module that is different in Mask R-CNN is the ROI Pooling. The authors of
Mask R-CNN concluded that the regions of the feature map selected by RoIPool were
slightly misaligned from the regions of the original image. Since image segmentation
requires specificity at the pixel level of the image, this leads to inaccuracies. This
problem was solved by using RoIAlign in which the feature map is sampled at different
points and then a bilinear interpolation is applied to get a precise idea of what would be at
pixel 2.93 (which was earlier considered as pixel 2 by the RoIPool).
Then a convolutional network is used which takes the regions selected by the ROI
classifier and generates masks for them. The generated masks are of low resolution-
28x28 pixels. During training, the masks are scaled down to 28x28 to compute the loss,
and during inferencing, the predicted masks are scaled up to the size of the ROI bounding
box. This gives us the final masks for every object.
35
2.6. Feature Extraction
Feature Extraction is one of the significant techniques in image processing. An image
feature is a distinguishing primitive characteristic or attribute of an image. One of the key
factors of image analysis is the extraction of sufficient information that leads to a
compact description of an examined image. Thus, feature extraction techniques are
applied to get the feature that will be useful in classifying and recognizing the images.
The aim is to reduce the data set of features which is valuable information present in an
image. Data present in an image are very complex and very high dimensional, it is a
necessary step to extract the informative feature from an image for object recognition and
segmentation. Besides lowering the computational cost, feature extraction is also a means
for controlling the so-called curse of dimensionality (Kunaver & asic, 2005).
Feature extraction methods are classified as low-level feature extraction and High-level
feature extraction. Low-level feature extraction is based on finding the points, lines, edge,
etc while high-level feature extraction methods use the low-level feature to provide more
significant information for further processing of Image analysis. Mostly high-level
feature extraction method uses the Artificial Neural Network (ANN)s to extract the
feature in multiple layers(Asogwa et al., 2007). CNN's learn data characteristics from
convolution operations, which is better suited to extract useful information from the
image.
36
2.7. Related Work
Lots of studies have been done recently to identify different plant diseases and pests
using various algorithms and techniques, especially in the identification and classification
of mango diseases. As per our understanding, the following are some studies we are
reviewing.
In (Arivazhagan & Ligi, 2018) a convolutional neural network was trained to identify
five common mango leaf diseases. They used a dataset of 1200 images of diseased and
healthy mango leaves was used of which 600 for training and 600 for testing. They used
data augmentation to increase their dataset by generating an artificial data. Even though
they achieved a better result, CNN takes much more training time, especially when using
augmented images. Thus to minimize the computational cost of CNN applying different
segmentation techniques and feeding the segmented image to CNN is preferable. And
combining it with machine learning techniques increases the performance of the
identification model.
In the study of (rdjan Sladojevic, Marko Arsenovic, Andras Anderla, Dubravko Culibrk,
& Darko Stefanovic, 2016), CNN has been tested for 13 different types of plant diseases.
In spites, they have used CNN as both classification and feature extraction for plant
diseases and achieved better results, here the main gap and the reason to get a better
result is they have used augmented images; CNN takes more training time for
identification. To overcome this problem, we consider different segmentation techniques
to select the region of interest on the mango leaf image and input it to CNN.
2.8. Summary
In this chapter, we reviewed different studies on automated identification of mango leaf
diseases using image-processing, machine learning, and deep learning techniques which
are related to our study. We tried to show and explain the performance and the limitations
of each study and how we are going to fill these gaps. To the best of our knowledge, most
of the research works are conducted to identify mango leaf diseases using CNN without
pre-processing the images and other machine learning algorithms. Thus, in our study, we
designed a CNN model (to fill the gaps in previous or related works) combined with
multiclass SVM were applied to detect and classify mango leaf diseases and insect pests.
We have also applied different pre-processing, segmentation techniques, and Activation
functions to boost the classification performance of the proposed model.
38
Table 2. 1: Summary of related works
39
CHAPTER THREE
3. METHODOLOGY
3.1. Introduction
This section describes the methods and processes used in our proposed model step by
step. We proposed an architecture that will process the Mango leaves using the CNN and
SVM models. We have trained the model using the images which are publicly available
real mango leaf images. The steps involved in the disease and insect pest detection are
Digital image acquisition, Image pre-processing (histogram equalization, noise removal,
image resizing, and image transformation), image segmentation, Feature extraction, and
classification.
The first phase is the image acquisition phase. In this step, the images of the various
leaves that are to be classified were taken using a digital camera. In the second phase
image pre-processing was completed. In the third phase, segmentation using K-Means
clustering was performed to discover the actual segments of the leaf in the image. Later
on, feature extraction for the infected part of the leaf was performed using CNN. Finally,
classification was done using three different classification algorithms (CNN + Softmax,
SVM + kernel, and CNN-SVM) to compare the best classification performer algorithm.
40
3.2. Architectural Framework of the proposed model
Image
acquisition
Training Testing
phase phase
Image resizing
Image augmentation
Image Image
segmentation segmentation
segmentation segmentation
SVM+ Kernel
CNN+SVM
Out put
41
3.3. Image Acquisition
Because many cutting-edge classification models need a massive amount of labelled data
to attain a better result, sufficient image acquisition is the very first and essential step that
requires capturing an image using a digital camera. However, there is no open-source
mango leaf image (i.e. healthy and diseased) data set repository. The healthy and
unhealthy mango leaf pictures were captured manually from Weramit fruit and vegetable
research and training sub-center, and Bahir Dar city. Based on the severity of the disease
we select the collection of the top three mango leaf diseases such that white mango scale,
mango anthracnose, and mango powdery mildew. More than 1200 leaf images were
captured, 300 images for mango Anthracnose, 300 images for mango Powdery Mildew,
300 images for mango White Scale, and 300 images for Healthy mango leaf, and then
image augmentation is used to increase the number of the training dataset. The dataset
containing 4000 pictures of mango leaves has been utilized for training and testing the
proposed Deep CNN model. The images were captured in early January and February
2020 with a mobile phone and a digital camera. The images were stored in JPEG format
with a resolution of 5312 x 2988.
However, the datasets which are used for this research were not relatively clean. The raw
mango leaf images were affected by interference and background noise. This makes the
mango leaf feature a complex one. The intended classification model, CNN cannot
extract useful features of mango leaf images directly in an easy manner. For the diagnosis
of the mango leaf image, we had to perform data pre-processing to get ready for our data
for training the model and to rectify the mentioned issues. Image resizing, Histogram
equalization, noise removal, augmentation, and segmentation were performed on the raw
mango leaf images respectively. Then feature extraction using CNN and classification
using CNN, SVM, and CNN-SVM on raw and pre-processed data were performed
respectively. The dataset was split into 80% for training and 20% for validation/testing.
We used this ratio because it is the most commonly used ratio in neural network
applications.
42
3.4. Image Pre-processing
43
Figure 3. 2: Results of resized mango leaf images.
3.4.2. Histogram Equalization
Image enhancement is employed to accentuate or sharpen the image properties or features
of an image such as boundaries, edges, or contrast to make a display more recognizable.
In this study, we have used brightness preserving adaptive histogram equalization
(BPAHE) instead of basic histogram equalization to obtain better results. brightness
preserving adaptive histogram equalization works on small regions, unlike basic
histogram equalization which works on the entire image. For disease detection in mango
leaves, we require only the affected part so an adaptive histogram is more suitable in this
case. In the case of adaptive histogram equalization of a color image, the histogram is
expected to give the number of times a particular color has occurred in the image. The
histogram equalization is performed by first converting the filtered image in LAB color
space format. Then histogram equalization is performed only for the luminance
component. The A and B components are unaltered. Then in the histogram, the equalized
luminance component, unaltered A, and B components are converted back to RGB
format. Then the diseased part is enhanced for further analysis.
The following Figure 3.3 shows an adaptive histogram equalized image (the image on the
right side).
44
Figure 3. 3: Typical results of histogram enhanced mango leaf image
45
3.4.3. Image de-nosing
Due to the reflection of the mango leaves under sunlight, the instability of the camera
during shooting, and the influence of the natural environment, some noise will appear in
the images. Besides, the image may be interfered with by random signals during the
transmission process. It was thus necessary to enhance and denoise the mango leaf
images to get original information on the image. There are several types of de-noising
techniques to remove unwanted fluctuation in the image. From these different types of
image denoising techniques, we have used the median filter. The Median filter is a non-
linear filter that is most commonly used as a simple way to reduce noise in an image. Its
claim to fame (over other techniques for noise reduction) is that it removes noise while
keeping edges relatively sharp.
46
3.4.4. Data Augmentation
It is known that training deep neural networks from limited training data causes the over-
fitting problem. Data augmentation is a way to reduce over-fitting and increase the
amount of training data. It creates new images by transforming the ones in the training
dataset. To take advantage of different data augmentation various settings were applied to
the augmentation of images including random image flipping, padding, translation,
zooming, cropping, noise injection, rotation, and scaling. At last, the dataset containing
4,000 images for training has been created using data augmentation. Then both the
original image and created images were fed to train the model to address over-fitting and
we observed that using data augmentation can increase the performance of the model.
47
Figure 3. 6: Typical results of augmented mango leaf images
48
3. Image image to array (image) #convert it into a 2-dimensional array
4. K-means kmeans(n_clusters, random_state).fit(Image) #fit the k-means
algorithm on this reshaped array and obtain the clusters.
5. clustered_imgImage.reshape(Image.shape[0], Image.shape[1], image.shape [2])
#bring back the clusters to their original shape
6. save segmented image
7. end
(a) (b)
Figure3. 7: Result of k-means segmentation on mango leaf image
3.5.2. Mask R-CNN
The second segmentation technique that has been used in this study was Mask R-CNN-
based segmentation which is an extension of the Faster R-CNN algorithm which comes
by adding a Conv Net for semantic segmentation at the pixel level for regions identified
by Faster R-CNN. This is due to the physical size of mango leaf lesions is smaller than
any other cereals applying other traditional segmentation results in an error for feature
extraction. To minimize the computational time for CNN as extracting representing
feature and as identifier segmentation is mandatory.
Then finally, we have applied the combined approaches of k-means and mask R-CNN
segmentation technique to enhance the performance of the identification accuracy.
49
3.6. Classification
The common structure of a CNN for image classification has two main parts: first, a long
chain of convolutional layers for feature extraction, and second a few (or even one) layers
of the fully connected neural network for classification.
For the feature extraction step, CNN with a sequential model with multiple layers is used.
It consists of seven layers of Conv2D, ReLu, Max-pooling2D, and a fully connected
layer. then to classify mango leaf diseases the extracted features are fed to three
classification models, which are CNN + Softmax, SVM + kernel, and CNN-SVM. Image
classification with raw data, data with pre-processing, and pre-processed segmented data
using the three classification models is applied. The steps in the classification process
include:
Pooling layers reduce the number of parameters when the images are too large. Here
MaxPooling2D of feature map matrix with stride value of 2 is applied to down sample the
convolved image. We used MaxPooling2D because in practice this has been found to
perform better than average pooling for image classification.it results in a down sampled
or pooled feature map highlighting the most relevant feature in the patch. To find the
probabilistic value, a flattening layer is used to convert three dimensionalities of an image
50
to a single one, followed by two fully connected dense layers containing a SoftMax
activation function for the highest likelihood classification.
The parameters used for compiling the CNN model are optimizer, loss, and metrics. As
an optimizer, 'adam' has been used. It is a successful optimizer during the entire training
to adjust the learning rate. For the loss function,' Categorical cross-entropy' is used. The
'accuracy' metric is used to see the accuracy score on the validation set during the training
of the model to make it even easier to interpret.
The support vector machine (SVM) is a linear binary classifier. The goal of the SVM is
to find a hyper-plane that separates the training data correctly in two half-spaces while
maximizing the margin between those two classes. Although SVM is a linear classifier,
which could only deal with linearly separable data sets, we can apply a kernel trick to
make it work for non-linear separable case. Thus, we applied a commonly used kernel
besides linear is the RBF kernel.
To read the CNN extracted features from a CSV file, we have used the read_csv ()
method of the pandas library. Then we divide the data into attributes and labels using the
drop() method. training and testing data sets were split using the 80:20 ratio for training
and validation sets. The train_test_split method of Scikit-Learn is used to divide data into
training and test sets.
51
The 'fit ()' feature was used to train the CNN model. Since it is the most widely used in
CNN applications, we have used the 80/20 training/test ratio. The time () function is used
to measure the time needed for the training. We have set the batch size and number of
epochs in the fit function, which will loop the model through the data. The testing process
was processed after the training process had been completed to check the effectiveness of
the trained CNN model.
52
CHAPTER FOUR
4. EXPERINENT, RESULT AND DISCUSSION
4.1. Introduction
In this chapter, the experimental evaluation of the proposed model for Mango disease and
insect pest detection using Enhanced imaging and Machine Learning techniques is
described in detail. Experimental evaluation approves the realization of the proposed
model or architecture. The dataset used and the implementation of the proposed model
are described thoroughly. Test results at the training and validation phase are compared.
The effect of different segmentation techniques and activation functions is evaluated and
compared with models disregarding these techniques. Besides, the CNN test results are
presented and compared with SVM and combined CNN-SVM classification models.
53
4.2.2. Implementation
In this section, the performance of the proposed models was evaluated using the data
described in the previous section. The models were developed using Keras (TensorFlow
of Python 3.7 as a backend), and anaconda spyder and the computing environment used
in these experiments was implemented with google colab, and a laptop computer with
Intel(R) Core (TM) i7-7500U CPU @ 2.70GHz 2.90GHz, 8.00 GB RAM using the
scikit-learn, Tensor flow, Keras, pillow, and OpenCV libraries, which use the python
programming language. The model is trained for 40 epochs, a batch size of 32, and a
starting or initial learning rate of 0.001 (1e-3). The data is partitioned into training and
testing datasets such that 80 percent of the data is allotted for training the model and 30
percent of the data is allotted for testing. Allocating 80% of the dataset for training is
close to optimal for reasonably sized datasets (greater than 100 images).
4.3. Result and Discussion
Two tests using the raw (without pre-processing), and pre-processed augmented mango
leaf images were carried out to verify the performance of the three learned models. To
evaluate the performance of the experimental result of the proposed model we Then test
results at the training and validation phase are evaluated. As we’ve recorded the accuracy
and loss of the model per epoch with the tables and plots below the gap between training
and validation accuracy indicates the amount of over-fitting. In the first phase, different
healthy and infected mango leaf image was inputted for classifying those. The images are
selected from the dataset folder. In the end, the expected classified outcome is found.
Case I: Validation accuracy of classification models with raw mango leaf images
Activation function CNN (%) SVM + RBF (%) Combined( CNN + SVM ) (%)
Tanh 92.75 95.2 96.12
Relu 92 94.91 95.75
Leaky Relu 95.23 95.53 96.32
In the figures listed below, Figure 4.1: shows the validation accuracy of CNN-SVM on
mango leaf disease and insect pest identification using the raw image dataset, while
54
Figure 4.2 shows the validation loss. At 40 steps, both model were able to finish training
in 1 hour and 58 minutes. The CNN-Softmax model had an average validation accuracy
of 95.23%, the SVM-RBF model had an average validation accuracy of 95.52% and an
average validation loss of 0.236794931, while the CNN-SVM model had an average
validation accuracy of 96.32% and an average validation loss of 0.268976859.
Figure 4. 1: Training and validation accuracy of CNN-SVM on raw mango leaf images
Figure4. 2: Training and validation loss of CNN-SVM on raw mango leaf images
Case II: Validation accuracy of classification models with preprocessed mango leaf
images
55
Figure 4. 3: validation accuracy of CNN-SVM on preprocessed mango leaf images
56
Table 4. 3: Classification analysis of CNN model with different Activation functions on a
preprocessed image dataset
After 40 training steps, both models were tested on the test cases of each dataset. The
tables above show the validation accuracies of CNN-Softmax, SVM-RBF, and CNN-
SVM on image classification using raw and preprocessed segmented datasets. When
CNN-SVM with preprocessed images are it shows an accuracy of 99.78 using leaky Relu
activation function and combined segmentation technique. The test accuracy on the raw
dataset does not corroborate the findings in, as it was CNN-SVM which had a better
classification accuracy than CNN-Softmax, and SVM-RBF. The result, easily shows that
the CNN-SVM classifier model with Leaky Relu activation successfully works and can
detect all diseases and healthy images with the highest validation accuracy.
57
CHAPTER FIVE
5. CONCLUSION AND RECOMMENDATION
5.1. Conclusion
Mango (Mangifera indica) is one of Ethiopia’s most delicious and valuable cultivated
fruit crops. It is shipped to many countries in the form of raw or mature fruits, and also in
the form of processed consumables such as mature slices of mango or juice, raw Pickle
mango, etc. mango is high in vitamins A and C, in herbal medicine it also has high
medicinal qualities. And mango leaves are mostly used during rituals as they are active
against gram-bacteria with antibiotical activity. However, recently the market value of
the Ethiopian mango has declined due to unregulated pesticide use. Image processing and
deep learning have a significant role in the early identification of diseases and insect pests
to control the use of hazardous pesticides. In this study, we proposed an enhanced mango
disease and insect pest detection system using hybrids of image processing and deep
learning. We collect our leaf image data set using a digital camera form from Amhara
Region main mango production areas such as Weramit fruit and vegetable research and
training sub-center, Bahir Dar city. And then we pre-process the images to get an
enhanced image dataset. Next, we apply K-means and mask R-CNN segmentation
techniques on the pre-processed images to get the region of interest. To increase the
model performance, we also applied the comparison of three learning functions such that
Tanh, Relu, Leaky Relu, and swish. Then CNN is used for feature extraction and
classification. Besides, other two additional models such that SVM and combined CNN-
SVM were compared. Based on the accuracy achieved by these models we can conclude
that the hybrid model CNN-SVM with Leaky Relu activation function has the best
performance for the identification of mango leaf diseases and insect pests with the
validation accuracy of 99.78% over the CNN and SVM models.
58
5.2. Recommendation
The mango leaf disease and insect pest identification models proposed in this research
work can be used for the detection of other plant diseases. We would conclude that the
model proposed matches the current requirement, although some problems need extra
work. This research work can be further improved or implemented to identify related
plant diseases. Therefore, we see two key points in this section that remain a challenge
and, of course, limit this research work. The first is that three segmentation techniques
were applied to improve the classification efficiency in this study. However, in the future
study considering other deep learning methods for segmentation can achieve better
results in detecting plant diseases. On the other hand, the best model can be built for
mobile devices in real-time mango leaf photos to monitor mango leaf diseases and insect
pests that lead to increase high-quality mango production.
59
References
Adatrao, S., & Mittat, M. (2016). An Analysis of Different Image Preprocessing Techniques for
Determining the Centroids of Circular Marks Using Hough Transform. 2nd International
Conference on Frontiers of Signal Processing (pp. 110-115). IEEE.
Anjulo, M. T. (2019). Perception of Ethiopian Mango Farmers on The Pest Status and Current
Management Practices for The Control of The White Mango Scale, Aulacaspis
Tubercularis (Homoptera: Diaspididae). JOURNAL OF ADVANCES IN AGRICULTURE.
Arivazhagan, S., & Ligi, S. (2018). Mango Leaf Diseases Identification Using Convolutional Neural
Network. International Journal of Pure and Applied Mathematics.
Ayalew, G., Fekadu, A., & Sisay, B. (2015). Appearance and Chemical Control of White Mango
Scale (Aulacaspis tubercularis) in Central Rift Valley. Science, Technology and Arts
Research Journal.
B, A. P., & K, G. S. (2016). Image Denoising Techniques-An Overview. IOSR Journal of Electronics
and Communication Engineering (IOSR -JECE) , 78-84.
CSA. (2017/18). REPORT ON AREA AND PRODUCTION OF MAJOR CROPS. THE FEDERAL
DEMOCRATIC REPUBLIC OF ETHIOPIA; CENTRAL STATISTICAL AGENCY.
Dessalegn, Y., Assefa, H., Derso, T., & Tefera, M. (2014). Mango Production Knowledge and
Technological Gaps of Smallholder Farmers in Amhara Region, Ethiopia. American
Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS).
Dhruv, B., Mittal, N., & Modi, M. (2017). Analysis of Different Filters for Noise Reduction in
Images. Recent Developments in Control, Automation and Power Engineering(RDCAPE),
410-415.
Dickson, B. (2020, January 06). What are convolutional neural networks (CNN)? Retrieved May
20, 2020, from TechTalks: https://bdtechtalks.com/2020/01/06/convolutional-neural-
networks-cnn-
convnets/#:~:text=A%20brief%20history%20of%20convolutional,a%20postdoctoral%20
computer%20science%20researcher.&text=The%20early%20version%20of%20CNNs,)%
2C%20could%20recognize%20handwritt
FAO. (2019). January - March 2019; Postharvest Extension Bulletin. FAO Office in Ethiopia.
60
Janwale, A. P. (2017). Plant Leaves Image Segmentation Techniques: A Review. INTERNA TIONAL
JOURNAL OF COMPUTER SCIENCES AND ENGINEERING, 147-150.
K, P. M., Sivakami, R., & M.Janani. (2019). Sooty Mould Mango Disease Identification Using Deep
Learning. International Journal of Innovative Technology and Exploring Engineering
(IJITEE), 402-405.
Kaur, A. (2014). A Review Paper on Image Segmentation and its Various Techniques in Image
Processing. International Journal of Science and Research (IJSR), 12-14.
Kaur, D., & Kaur, Y. (2014). Various Image Segmentation Techniques: A Review . International
Journal of Computer Science and Mobile Computing, 809-814.
Kaur, G., Kumar, R., & Kainth, K. (2016). A Review Paper on Different Noise Types and Digital
Image Processing. International Journal of Advanced Research in Computer Science and
Software Engineering, 6(6), 562-565.
Kunaver, M., & asic, J. T. (2005). Image feature extraction - An overview. IEEE Xplore.
Langampol, K., Srisomboon, K., Patanavijit, V., & Lee, W. (2019). Smart Switching Bilateral Filter
with Estimated Noise Characterization for Mixed Noise Removal. Mathematical
Problems in Engineering.
Manjula.KA. (2015). Role of Image Segmentation in Digital Image Processing For Information
Processing. International Journal of Computer Science Trends and Technology (IJCST),
312-318.
rdjan Sladojevic, Marko Arsenovic, Andras Anderla, Dubravko Culibrk, & Darko Stefanovic.
(2016). Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image
Classification. Computational Intelligence and Neuroscience.
S, M., V.P, L., & Rangaswamy, D. S. (2016). Survey On Image Denoising Techniques. International
Journal of Science, Engineering and Technology Research (IJSETR), 2824-2827.
S.Veling, P. S., Kalelkar, M. S., Ajgaonkar, M. L., Mestry, M. N., & N.Gawade, M. N. (2019).
Mango Disease Detection by using Image Processing. International Journal for Research
in Applied Science & Engineering Technology (IJRASET), 3717-3726.
Saha, S. (2018, DEcember 15). A Comprehensive Guide to Convolutional Neural Networks — the
ELI5 way. Retrieved may 7, 2020, from towardsdatascience:
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-
networks-the-eli5-way-3bd2b1164a53
61
Sethupathy, J., & S, V. (2016). OpenCV Based Disease Identification of Mango Leaves.
International Journal of Engineering and Technology (IJET).
Singh, R., Singh, P., & Parveen, F. (2015). BRIEF REVIEW ON IMAGE DENOISING TECHNIQUES.
International Journal of Science, Technology & Management , 336-344.
Sonam, & Dahiya, R. (2015). Histogram Equalization Based Image Enhancement Techniques For
Brightness Preservation And Contrast Enhancement. International Journal of Advanced
Research in Education Technology (IJARET), 83-89.
Sudhakar, S. (2017, July 10). Histogram Equalization. Retrieved April 23, 2020, from Towards
Data Science: https://towardsdatascience.com/histogram-equalization-5d1013626e64
Tegegne, T., & Birhanie, W. (2019). Knowledge Based System for Diagnosis and Treatment of
Mango Diseases. Information and Communication Technology for Development for
Africa (pp. 11-23). Bahir Dar: Springer Nature Switzerland.
Tewodros Bezu, Kebede Woldetsadik , & Tamado Tana. (2014). Production Scenarios of Mango
(Mangifera indicaL.) in Harari Regional State, Eastern Ethiopia. Wollega University:
Science, Technology and Arts Research Journal Sci. Technol. Arts Res.J (STAR).
TF, D. (2018). Newly Emerging Insect Pests and Diseases as a Challenge for Growth and
Development of Ethiopia: The Case of Western Oromiya. Journal of Agricultural Science
and Food Research.
Ullagaddi, S. B., & Raju, S. (2017). Disease Recognition in Mango Crop Using Modified Rotational
Kernel Transform Features. International Conference on AdvancedComputing and
Communication Systems.
WA, D., SS, W., & MK, N. (2016). Survey of anthracnose (Colletotrichum gloeosporioides) on
mango (Mangifera indica) in North West Ethiopia. Plant Pathology & Quarantine .
Yamashita, R., Nishio, M., Do, R. K., & Togashi, K. (2018). Convolutional neural networks: an
overview and application in radiology. Insights into Imaging , 611-629.
62