0% found this document useful (0 votes)
58 views

Project Report ML Team 3-1

Uploaded by

nancygenaigdg24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Project Report ML Team 3-1

Uploaded by

nancygenaigdg24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Plant Disease Detection

A PROJECT REPORT

21CSC305P –MACHINE LEARNING


(2021 Regulation)
III Year/ V Semester
Academic Year: 2024 -2025

Submitted by
ISHANVI SINGH [RA2211003011203]
TAANISHA SHARMA [RA2211003011206]
GREESHMA REDDY [RA2211003011210]
NANCY DHIMAN [RA2211003011238]

Under the Guidance of


Dr. ASWATHY K CHERIAN
Designation
Department of Computational Intelligence

in partial fulfillment of the requirements for the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE ENGINEERING

SCHOOL OF COMPUTING
COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR- 603 203
NOVEMBER 2024
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603 203

BONAFIDE CERTIFICATE

Certified that 21CSC305P - MACHINE LEARNING project report


titled “PLANT DISEASE DETECTION” is the bonafide work of
“ISHANVI SINGH [RA2211003011203], TAANISHA SHARMA
[RA2211003011206], GREESHMA REDDY [RA2211003011210],
NANCY DHIMAN [RA2211003011238]” who carried out the task of
completing the project within the allotted time.

SIGNATURE SIGNATURE
Dr. ASWATHY K CHERIAN <Dr. R.Annie Uthra>
Course Faculty Head of the Department
Professor
<Department of Computational Intelligence> <Department of Computational Intelligence>
SRM Institute of Science and Technology SRM Institute of Science and Technology
Kattankulathur Kattankulathur
ABSTRACT

Plant diseases are a serious agricultural issue, leading to significant crop damage and economic losses. Early and
accurate detection of plant diseases is essential for farmers to take timely actions and prevent further spread. This
project aims to develop an automated system for detecting plant diseases using Convolutional Neural Networks
(CNN), a powerful deep learning technique commonly used for image classification. By analyzing images of
plant leaves, the system can identify whether the plant is healthy or affected by a disease, helping farmers
monitor crop health more effectively.

The model is built using the VGG16 CNN architecture, a model that has been proven effective in recognizing
patterns in images. Transfer learning is applied to adapt this pre-trained model to our specific task, allowing the
model to learn faster and perform better with less data. We used a dataset from Kaggle’s "New Plant Diseases
Dataset," which contains images of healthy and diseased plant leaves across 38 different categories. This variety
ensures that the model is trained to recognize a wide range of plant diseases.

After training, the model achieved an accuracy of 98.42%, meaning it can correctly identify plant diseases in
most cases. This high accuracy is due to CNN’s ability to learn complex patterns, like the shape, color, and
texture of leaves, which are often key indicators of disease. To further improve the model, we used techniques
like data augmentation (to increase the diversity of training data) and regularization (to prevent overfitting),
making the model more generalizable to new, unseen data.

The system has the potential to be highly beneficial for farmers and agricultural experts, offering a fast, reliable,
and scalable solution for detecting plant diseases in real time. It reduces the need for manual inspection, which
can be time-consuming and error-prone, and provides accurate disease diagnosis that can lead to better crop
management and higher yields. Integrating AI-based solutions like this into agriculture can significantly improve
how plant health is monitored, especially in regions where access to expert knowledge is limited.

In the future, this system can be enhanced by expanding the dataset to include more plant species and diseases or
by integrating it into mobile applications for field use. Overall, this project demonstrates how machine learning,
particularly CNNs, can be used to address real-world problems in agriculture and promote more sustainable
farming practices.
TABLE OF CONTENTS
ABSTRACT iii

LIST OF FIGURES v

LIST OF TABLES vi

ABBREVIATIONS vii

1 INTRODUCTION 1
1.1 Background of the Project
1.2 Problem Statement
1.3 Objective of the Study
1.4 Scope of the Project
2 LITERATURE SURVEY 5
2.1 Traditional Plant Disease Detection Methods
2.2 Evolution of Machine Learning in Plant Disease Detection
2.3 CNN Architectures and Their Impact on Performance
2.4 Challenges in CNN-Based Plant Disease Detection
2.5 Real-world Applications of CNN-Based Plant Disease Detection
3 METHODOLOGY OF [Proposed System Name] 15
3.1 Data Collection
3.2 Data Preprocessing
3.3 Model Definition
3.4 Data Training
3.4.1 Train-Validation-Test Split
3.4.2 Model Compilation
3.4.3 Training Process
3.4.4 Validation Monitoring
3.4.5 Evaluation and Testing
4 RESULTS AND DISCUSSIONS 21
4.1 Model Evaluation
4.1.1 Evaluation metrics
4.1.2 Evaluation results
4.2 Project Output
4.3 Comparison with existing models
4.3.1 Model Comparison
4.3.2 Performance Analysis
5 CONCLUSION AND FUTURE ENHANCEMENT 30
REFERENCES 46
LIST OF FIGURES

Figure No Title of the Figure Page No


1.1 Block diagram 11
LIST OF TABLES

Table No Title of the Table Page No


1 Comparative Analysis 30
ABBREVIATIONS
AI – Artificial Intelligence
CNN – Convolutional Neural Network
VGG – Visual Geometry Group
ReLU – Rectified Linear Unit
FLOPs – Floating Point Operations per Second
MB – Megabyte
KB – Kilobyte
TPU – Tensor Processing Unit
GPU – Graphics Processing Unit
CPU – Central Processing Unit
IoT – Internet of Things
F1-Score – F1 Measure (Harmonic mean of Precision and Recall)
Adam – Adaptive Moment Estimation (Optimizer)
SGD – Stochastic Gradient Descent
mAP – Mean Average Precision
LR – Learning Rate
TL – Transfer Learning
ML – Machine Learning
IoU – Intersection over Union
SOTA – State-of-the-Art
FP – False Positive
FN – False Negative
TP – True Positive
TN – True Negative
ROC – Receiver Operating Characteristic (Curve)
AUC – Area Under Curve
FC – Fully Connected (Layer)
BN – Batch Normalization
CHAPTER 1

INTRODUCTION

1.1 Background of the Project


Agriculture plays a vital role in feeding the world’s population, and plant health is crucial for
ensuring optimal crop yields. However, plant diseases are a major threat to agriculture,
causing significant damage to crops and leading to substantial economic losses. Detecting
these diseases in their early stages is essential for preventing their spread and reducing the
impact on crop production.

Traditionally, disease detection has relied on manual inspection by farmers or agricultural


experts. This method, while effective to some extent, is often time-consuming, labor-intensive,
and subject to human error. Furthermore, in many rural or developing regions, access to
experts is limited, delaying diagnosis and intervention.

Recent advancements in machine learning and computer vision, specifically in the field of
deep learning, have introduced new possibilities for automating plant disease detection.
Convolutional Neural Networks (CNNs), a type of deep learning model, have shown
remarkable performance in image classification tasks. This project aims to leverage CNNs to
build an automated system for plant disease detection through images of plant leaves.

1.2 Problem Statement

The major challenge faced by farmers today is the inability to detect plant diseases in their
early stages. Manual detection methods can be slow and prone to inaccuracies, which results in
delayed treatment and a higher risk of the disease spreading to healthy plants. In rural areas, the
lack of readily available experts further exacerbates the issue. The consequences of delayed or
incorrect disease detection include reduced crop yields, increased costs for pesticide treatments,
and lower overall productivity.

To address this problem, there is a pressing need for an automated system that can quickly and
accurately diagnose plant diseases from leaf images. Such a system would provide farmers with
a tool for early detection, allowing them to take timely measures to protect their crops and
improve yields.
Fig 1.1 - Common Plant Diseases and Their Symptoms from the Dataset

1.3 Objective of the Study


The primary goal of this project is to develop a deep learning-based model using
Convolutional Neural Networks (CNNs) that can automatically detect plant diseases from
images of plant leaves. The system aims to:

● Build an efficient and accurate CNN-based model for plant disease detection.
● Use the pre-trained VGG16 model with transfer learning to reduce training time and
enhance model performance.
● Achieve high accuracy, targeting a test accuracy of 94.72%, ensuring the model’s
effectiveness in real-world applications.
● Develop a scalable solution that can be applied to large datasets and diverse farming
environments, providing farmers with an accessible and reliable tool for monitoring plant
health.
● Implement a user-friendly interface or mobile application that allows farmers to easily
upload leaf images for real-time disease detection and diagnosis.
● Explore the integration of additional data sources, such as environmental factors (e.g.,
humidity, temperature), to improve the model's predictions by considering factors that
contribute to disease spread.
● By focusing on these objectives, this project aims to contribute to sustainable farming
practices, reduce crop losses, and enhance agricultural productivity.
Fig 1.2 Image Classification Process Using CNNs

1.4 Scope of the Project


Agriculture plays a vital role in feeding the world’s population, and plant health is crucial for
ensuring optimal crop yields. However, plant diseases are a major threat to agriculture,
causing significant damage to crops and leading to substantial economic losses. Detecting
these diseases in their early stages is essential for preventing their spread and reducing the
impact on crop production.

Traditionally, disease detection has relied on manual inspection by farmers or agricultural


experts. This method, while effective to some extent, is often time-consuming, labor-intensive,
and subject to human error. Furthermore, in many rural or developing regions, access to
experts is limited, delaying diagnosis and intervention.

Recent advancements in machine learning and computer vision, specifically in the field of
deep learning, have introduced new possibilities for automating plant disease detection.
Convolutional Neural Networks (CNNs), a type of deep learning model, have shown
remarkable performance in image classification tasks. This project aims to leverage CNNs to
build an automated system for plant disease detection through images of plant leaves.

Fig 1.3 - Disease Classification in Plants Using CNN


CHAPTER 2

LITERATURE SURVEY

With the advent of deep learning, especially Convolutional Neural Networks (CNNs), plant disease
detection has witnessed a significant transformation. Traditional manual inspection methods have given
way to more accurate, faster, and scalable solutions that can be deployed using mobile phones, drones,
and remote sensing technologies. This literature survey presents an in-depth review of existing research
on plant disease detection, particularly emphasizing CNN-based approaches, their evolution, challenges,
and real-world applications.

2.1 Traditional Plant Disease Detection Methods


Plant diseases have long been a critical threat to global agricultural productivity. Manual disease detection
has been the norm for centuries, relying on human expertise to visually examine plants and identify
symptoms. These traditional methods, while effective to some extent, are labor-intensive, prone to human
error, and highly dependent on the skills of the observer. Furthermore, environmental factors such as
lighting conditions, leaf orientation, and natural plant variation can obscure disease symptoms, leading to
inaccurate or delayed diagnoses.

Manual approaches often fail to meet the growing needs of modern agriculture, which requires real-time,
scalable, and accurate disease detection methods. Early computer-based image processing systems
improved accuracy by automating the process of disease detection using techniques like color analysis,
edge detection, and morphological features. These methods laid the groundwork for automated disease
detection but suffered from limitations such as sensitivity to environmental conditions and inability to
generalize across different plant species and diseases.

Paper Zhang et al. (2016) designed one of the initial image processing-based systems for plant disease
detection. Their method focused on segmenting diseased areas and extracting key features like color,
texture, and shape, but it struggled with complex diseases that produced subtle visual changes in plants.

2.2 Evolution of Machine Learning in Plant Disease Detection

With the rise of machine learning in the early 2000s, the reliance on handcrafted features gave way to
more automated approaches. Machine learning models like Support Vector Machines (SVM), k-Nearest
Neighbors (k-NN), and Random Forest (RF) became popular in plant disease classification. These models
were capable of handling moderately sized datasets and were faster in deployment than manual methods.
However, their effectiveness still relied heavily on extracting the right features from the image.
The performance of these machine learning algorithms was limited by their sensitivity to environmental
conditions and their inability to automatically learn and extract deep hierarchical features from the input
images. As a result, while these models outperformed manual inspection methods, they often required
significant preprocessing and did not generalize well across different datasets or plant species.

Deep learning, particularly CNNs, revolutionized image classification tasks across various fields,
including medical imaging, facial recognition, and object detection. CNNs are particularly well-suited for
image-based plant disease detection due to their ability to automatically learn spatial hierarchies of
features from raw input data. CNNs can capture subtle differences in disease symptoms, enabling them to
outperform traditional machine-learning models that rely on handcrafted features.

CNNs consist of multiple layers of convolutions, pooling, and fully connected layers that progressively
learn different levels of abstraction from the input image. This ability to learn from raw pixel data allows
CNNs to handle the complexity and variability of plant disease images, making them ideal for large-scale
agricultural applications.

Mohanty et al. (2016) demonstrated the power of CNNs for plant disease detection using the PlantVillage
dataset, a comprehensive collection of plant images covering 14 crop species and 26 diseases. Their
CNN-based model achieved an accuracy of 99.35%, significantly outperforming traditional machine
learning methods. The study highlighted the potential of CNNs for real-time, high-accuracy plant disease
diagnosis.

2.3 CNN Architectures and Their Impact on Performance


Different CNN architectures have been employed for plant disease detection, each offering unique
strengths. Early architectures like AlexNet and LeNet have been used for basic plant disease detection
tasks. However, more advanced architectures like VGGNet, ResNet, and DenseNet have shown improved
performance in terms of accuracy and computational efficiency. The depth of these architectures allows
them to learn more complex patterns and features from plant disease images.

VGGNet: VGGNet, known for its simplicity and depth, has been used in several plant disease
classification studies. Its use of small (3x3) convolutional filters allows it to capture fine details in plant
images, making it suitable for identifying subtle disease symptoms.

Too et al. (2019) used a modified VGG16 model for tomato disease detection. They reported an accuracy
of 98.61%, emphasizing the role of deep architectures in enhancing disease classification performance.

ResNet: Residual Networks (ResNet) introduced the concept of skip connections to address the vanishing
gradient problem in deep networks. This architecture has proven particularly effective in plant disease
detection due to its ability to train deeper networks without the risk of performance degradation.

Chen et al. (2020) applied ResNet50 for the detection of multiple plant diseases, achieving 98.76%
accuracy. The use of skip connections helped the model generalize well across different plant species and
disease types.
DenseNet: DenseNet, which connects each layer to every other layer, enhances feature propagation and
reduces the risk of vanishing gradients. This architecture has been applied to complex plant disease
datasets with promising results.

Zhang et al. (2021) used DenseNet to classify diseases in apple and peach trees, achieving over 97%
accuracy across different datasets.

Fig 2.1 - Comparison of accuracy in data prediction of different CNN Models

2.4 Challenges in CNN-Based Plant Disease Detection

While CNNs have significantly advanced the field of plant disease detection, several challenges remain:

Intra-class variability: Within the same disease class, there can be significant variations in how the
disease manifests, depending on the stage of infection, lighting conditions, and environmental factors.
This can make it difficult for CNNs to learn consistent patterns across all samples of a particular disease.

Inter-class similarity: Some plant diseases exhibit similar symptoms, leading to confusion between
classes. For example, early-stage symptoms of different fungal diseases may look alike, making it
challenging for CNNs to differentiate between them.

Data scarcity: High-quality, labeled datasets are essential for training CNNs, but they are often scarce,
especially for rare or newly emerging diseases. Furthermore, collecting such datasets is expensive and
time-consuming.

Ferentinos et al. (2018) highlighted these challenges in their study on deep learning-based plant disease
detection. To overcome the limitations of small datasets, they employed techniques like data
augmentation and transfer learning, achieving state-of-the-art results on multiple crop species.
2.5 Real-world Applications of CNN-Based Plant Disease Detection
CNN-based plant disease detection models have found real-world applications in precision agriculture,
offering farmers and agronomists real-time diagnostic tools. Mobile applications equipped with CNN
models allow farmers to capture images of diseased plants and receive instant feedback on the potential
disease and recommended treatment. Additionally, drones and satellite-based systems equipped with
CNN-powered cameras are being used for large-scale crop monitoring.

CNNs are particularly suited for image analysis tasks, thanks to their hierarchical structure, which allows
them to automatically learn spatial hierarchies of features from input images. This capability is critical in
agriculture, where diseases manifest visually through symptoms such as discoloration, lesions, and
deformities on plant leaves. Traditional methods of plant disease diagnosis often require extensive expert
knowledge and can be time-consuming, leading to delayed interventions. In contrast, CNNs facilitate
rapid and accurate disease detection, enabling timely management decisions that can significantly reduce
crop losses.

Applications in Agriculture

1. Field Diagnosis and Monitoring

CNN-based systems are increasingly being employed in field conditions for real-time monitoring
of crops. Mobile applications integrated with CNN models can allow farmers to capture images of
their crops and receive immediate feedback regarding potential diseases. For instance, systems
like PlantVillage and Plantix use CNNs to identify diseases such as late blight in potatoes or leaf
rust in wheat, enabling farmers to take proactive measures before the disease spreads.

2. Automated Disease Classification

CNNs have been successfully implemented in various studies to classify different types of plant
diseases accurately. For example, a study by Ferentinos (2018) utilized CNNs to classify diseases
in tomato plants, achieving an impressive accuracy of over 96%. Such automated classification
systems reduce the dependency on human expertise and can be scaled for widespread agricultural
use, allowing for efficient management of large plantations.

3. Precision Agriculture

In the context of precision agriculture, CNNs are employed to enhance decision-making processes
through data-driven insights. By integrating CNN-based disease detection with other technologies,
such as drone imagery and IoT sensors, farmers can achieve more precise monitoring of plant
health. This integration allows for targeted interventions, optimizing resource usage (e.g.,
pesticides, water) and minimizing environmental impact.
4. Research and Development

CNNs also play a vital role in research settings, where they can assist in identifying new
pathogens and studying disease progression. By analyzing vast amounts of image data from
controlled experiments, researchers can gain insights into the visual characteristics of emerging
plant diseases. This knowledge is crucial for developing resistant crop varieties and effective
management strategies.

5. Education and Training

The deployment of CNN-based applications in agricultural education has the potential to enhance
learning outcomes for students and farmers alike. Educational platforms that incorporate
CNN-based disease detection tools can provide users with interactive experiences, helping them
learn to identify plant diseases more effectively. This knowledge transfer is essential for building a
resilient agricultural workforce capable of responding to plant health challenges.

Picon et al. (2019) developed a mobile application for real-time plant disease detection using CNN
models. The app, powered by a cloud-based platform, achieved high accuracy in classifying diseases
from various crops, providing farmers with timely intervention suggestions.

Convolutional Neural Networks have dramatically improved the accuracy and scalability of plant disease
detection. From basic CNN architectures to advanced models like ResNet and DenseNet, deep learning
has proven its ability to outperform traditional methods in both accuracy and real-time applicability.
However, challenges such as intra-class variability, inter-class similarity, and data scarcity persist,
pushing researchers to explore transfer learning, synthetic data generation, and data
CHAPTER 3

METHODOLOGY OF CNN-Based Plant Disease Detection

3.1 Data Collection


The foundational step of any machine learning project is gathering the necessary data. In this case, the
dataset used is the New Plant Diseases Dataset sourced from Kaggle. This dataset provides a diverse set
of images essential for training and validating the CNN model.

The dataset contains over 54,000 images divided into 38 categories, including healthy plants and those
affected by various diseases. Each class represents different diseases associated with specific plant
species, including common crops such as tomatoes, potatoes, and bell peppers.

This comprehensive coverage ensures that the model learns to recognize various diseases, increasing its
applicability in real-world scenarios.

The dataset was downloaded from Kaggle and extracted into a local directory. The organization of the
dataset is crucial; images are structured in folders representing each class (e.g., Healthy, Tomato Leaf
Spot, Potato Blight), facilitating easy loading and manipulation during preprocessing.

Fig 3.1 - New Plant Diseases Dataset from Kaggle


Fig 3.2 - The number of images of specific plants in the Kaggle Dataset

3.2 Data Preprocessing


Data preprocessing is vital for enhancing the quality of the dataset, ensuring that the images are in a
suitable format for the CNN model. This stage prepares the data for training, ultimately affecting the
model's accuracy and performance.

Steps:

● Image Loading:

Images were loaded into the program using libraries like TensorFlow and Keras. The
ImageDataGenerator class was particularly useful for loading images and applying real-time data
augmentation.

● Image Resizing:

Each image was resized to 224x224 pixels. This uniformity is essential as CNNs require fixed
input sizes. Resizing also helps reduce computational load without significantly affecting the
model's ability to learn relevant features.
● Normalization:

Pixel values were normalized by dividing them by 255, converting the pixel values from a range
of [0, 255] to [0, 1]. Normalization accelerates convergence during training, as it prevents any
single feature from dominating the learning process due to its scale.

● Data Augmentation:

To prevent overfitting and to increase the robustness of the model, various augmentation
techniques were applied:

● Rotation: Randomly rotating images by a specified degree to simulate different


orientations.
● Zooming: Applying zoom to randomly zoom in on portions of images, allowing
the model to learn details in varying scales.
● Flipping: Flipping images horizontally or vertically provides additional training
samples from which the model can learn.
● Brightness Adjustments: Randomly altering the brightness of images to simulate
different lighting conditions, further enhancing model robustness.

Fig 3.3 - Data Preprocessing (program)

Fig 3.4 - Data Preprocessing


3.3 Model Definition
Model training involves optimizing the parameters of the CNN to minimize the difference between
predicted and actual outputs using the training dataset.

Steps:

● Model Initialization:

A Sequential model was initiated. This structure allows for the stacking of layers where each layer
has a specific function, making it easier to build and modify the architecture.

● Convolutional Blocks:

Multiple convolutional blocks were added to the model. Each block consists of:

● Convolutional Layers: These layers apply filters to the input images to extract features. The
first layer uses 32 filters, followed by increasing the number of filters in subsequent layers
(64, 128, and finally 256), allowing the model to learn progressively more complex features.
● Activation Function (ReLU): The Rectified Linear Unit (ReLU) activation function
introduces non-linearity, allowing the network to learn complex patterns in the data. ReLU is
chosen for its computational efficiency and effectiveness in training deep networks.
● Batch Normalization: This technique normalizes the outputs of each layer, helping stabilize
and accelerate training by reducing internal covariate shift.
● Max Pooling: Max pooling layers reduce the spatial dimensions of the feature maps,
effectively down-sampling the data while retaining the most critical information. This helps
to mitigate overfitting and reduces computation.
● Dropout Layers: These layers randomly deactivate a fraction of neurons during training
(30% for convolutional blocks, 50% for the fully connected layer) to prevent the model from
relying too heavily on any one feature, thereby enhancing generalization.
● Flattening:

After the convolutional blocks, a flattening layer was added to convert the 2D feature maps into a
1D vector, making it suitable for input into fully connected layers.

● Fully Connected Layers:

A dense layer with 256 neurons follows the flattening layer. This layer helps in combining
features learned in the previous layers to make final predictions. A ReLU activation function is
used here as well.
Finally, the output layer contains 38 neurons, each corresponding to one of the classes in the
dataset, using a softmax activation function to produce a probability distribution across the
classes.

Fig 3.5 CNN Layers of Model


3.4 Data Training

The process of training a Convolutional Neural Network (CNN) for plant disease detection involves
optimizing the model parameters to minimize the difference between predicted and actual labels. This
section outlines the steps involved in model training, including the dataset split, compilation, training, and
evaluation phases.

3.4.1 Train-Validation-Test Split

The dataset used for the plant disease detection model was sourced from Kaggle (New Plant Diseases
Dataset). To ensure robust model evaluation, the data was divided into three sets:

● 70% of the data was used for training the model. This subset enables the model to learn patterns
from a diverse set of plant disease images.
● 15% was used for validation. This set was utilized to tune the model's hyperparameters and
monitor its performance during training, preventing overfitting.
● 15% was reserved for testing. This subset evaluates the model's performance on unseen data,
providing an unbiased estimate of its generalization capability.

This split ensures that the model has access to enough data for training while retaining separate sets for
fine-tuning and performance evaluation.

3.4.2 Model Compilation

The model was compiled using the following settings:

● Optimizer: Adam optimizer, with an initial learning rate of 0.0005. Adam was chosen for its
adaptive learning rate capabilities, allowing the model to converge efficiently.
● Loss Function: Categorical Crossentropy, appropriate for multi-class classification tasks like
plant disease detection (38 distinct disease classes in this case).
● Metrics: Accuracy was selected as the primary metric to evaluate performance during training, as
it provides a straightforward measure of the proportion of correct predictions.

Fig 3.6 - Model Compilation


3.4.3 Training Process

The training process involved iteratively passing batches of training data through the model and adjusting
its weights to minimize the loss:

● The model was trained for 20 epochs, with a batch size of 32 or 64 images, depending on
computational resources. This was necessary for the model to learn meaningful patterns.
● After each epoch, the validation set was used to evaluate the model's performance. This enabled
early detection of overfitting or underfitting trends.
● Batch Normalization was used after each convolutional layer to stabilize and speed up the
training process, while Dropout layers were employed to prevent overfitting.

3.4.4 Validation Monitoring

The model's performance on the validation set was monitored after each epoch:

● Early Stopping was employed to prevent overfitting. If the validation loss stopped improving
after 7 epochs, the training was automatically halted, and the best-performing model (based on
validation loss) was restored.
● Model Checkpointing saved the best model during training, based on validation performance.
This ensures that the best model is preserved even if overfitting occurs later in training.
● ReduceLROnPlateau was implemented to reduce the learning rate when the model's
performance plateaued, allowing the model to make finer updates to its parameters in later stages
of training.

Fig 3.7 - Code for training process of the model


3.4.5 Evaluation and Testing

After training, the model was evaluated on the reserved test set, providing an unbiased estimate of its
real-world performance. The test set consisted of data that the model had never seen during training,
simulating the model’s ability to generalize to new, unseen plant disease images.

Fig 3.8 - Model training and validation accuracy


CHAPTER 4
RESULTS AND DISCUSSIONS

4.1 MODEL EVALUATION

This section contains an objective evaluation of the performance of the trained models, an explanation of
some of the factors that influence the evaluation results, and an outline of the testing procedure

4.1.1 Evaluation metrics

Evaluation metrics are used to measure the quality of trained machine learning model. Evaluation of
machine learning models or algorithms is essential for any project. There are many different types of
evaluation metrics to test a model. The following are the metrics that were used to evaluate the trained
models in this thesis:

- Confusion matrix: provides a detailed overview of the classification. For a better performing
model True Positive (TP), True Negative (TN) must be high and False Negative (FN), False
Positive (FP) should be low as possible.

o True Positive (TP): Number of correctly predicted positive samples.

o False Positive (FP): Number of negative samples incorrectly predicted as positive.

o True Negative (TN): Number of correctly predicted negative samples.

o False Negative (FN): Number of positive samples incorrectly labelled as negative.

- Precision: tells the ratio of the true positives to the total predicted positives.

- Recall: tells the proportion that the model is accurately classifying the true positives. It is also
called Sensitivity.

- F1 score: defined as the harmonic mean of precision and recall. The higher the precision and
recall, the higher the F1-score.

- F1 score curve: a graph shows the tradeoff between F1 score and different threshold. By
observing the changes of F1 score, the curve can help to find the optimal threshold for a
model.
- ROC curve (receiver operating characteristic curve): a graph showing the performance
of a classification model at all classification thresholds. This curve plots two parameters:
True Positive Rate (TPR) and False Positive Rate (FPR).

o 𝑇𝑃𝑅 = 𝑇𝑃/(𝑇𝑃 + 𝐹𝑁)

o 𝐹𝑃𝑅 = 𝐹𝑃/(𝐹𝑃 + 𝑇𝑁)

- AUC (area under the ROC curve): measures the entire two-dimensional area underneath
the entire ROC curve. AUC provides an aggregate measure of performance across all
possible classification thresholds.

4.1.2 Evaluation results

1. Confusion Matrix:

The confusion matrix gives a clear picture of how well the model classifies the plant diseases across
different categories. It shows the true labels (actual class) on the Y-axis and the predicted labels on the
X-axis.

● Diagonal elements: These represent the number of times the model correctly predicted a certain
class. Higher values here indicate good performance.
● Off-diagonal elements: These represent misclassifications, where the model predicted the wrong
class.
Fig 4.1 - Confusion Matrix

2. ROC (Receiver Operating Characteristic) curves:

Each class (such as different plant diseases or healthy conditions) has its own ROC curve, which is used
to evaluate the performance of the model in predicting whether an instance belongs to that class or not.

● True Positive Rate (TPR) (y-axis): Also called recall or sensitivity, this measures how well the
model correctly identifies positive instances (e.g., a diseased leaf being correctly classified as
diseased).
● False Positive Rate (FPR) (x-axis): This measures how many times the model incorrectly
identifies negative instances as positive (e.g., a healthy leaf being wrongly classified as diseased).
Results from ROC Curve 1:

● AUC of 1.00 (Perfect Performance): For many classes like Apple___Apple_scab,


Grape___Black_rot, Tomato___Bacterial_spot, etc., the model has an AUC of 1.00. This means
the model classifies these diseases perfectly, without any false positives or negatives.
● Flat Line for AUC = 1.00: A curve at the top of the plot, like for Apple___Apple_scab and
Corn_(maize)___Northern_Leaf_Blight, suggests that the classifier performs extremely well for
those classes, with no false positives or false negatives.
● AUC = nan: Some classes like Apple___healthy, Peach___healthy, and Tomato___healthy have
an AUC value of nan. All the instances in these classes might be classified perfectly without
errors, leading to an undefined ROC curve.

Fig 4.1 - Confusion Matrix


3. Classification Report:

The classification report provides precision, recall, and F1-score for each class, as well as macro- and
weighted averages across all classes.

● Precision: The ratio of true positive predictions to the total predicted positives. It tells you how
many of the predicted classes were correct. The precision for Apple_Apple_scab is 0.07, meaning
only 7% of the predicted "Apple scab" cases were correct.
● Recall: The ratio of true positives to the total actual positives (sensitivity). It tells you how well
the model is identifying true positives.
● F1-score: The harmonic mean of precision and recall. It balances the trade-off between precision
and recall.

Table 4.1 - Classification Report of Data


4.2 Project Output
4.3 Comparison with existing models

In this section, we use the common dataset to compare the performance of a custom Convolutional Neural
Network (CNN) and the VGG16 model for plant disease detection. The comparison focuses on accuracy,
model complexity, and computational efficiency.

4.3.1 Model Comparison

● Accuracy: The custom CNN achieves 98.31% accuracy, surpassing VGG16’s 91.2% and other
models like InceptionV3 and MobileNet.
● Precision, Recall, and F1-Score: The proposed CNN has higher precision, recall, and F1-scores,
demonstrating more consistent performance in detecting plant diseases.
● Parameters and Model Size: The CNN is highly efficient with only 208,802 trainable
parameters, compared to VGG16’s 9.4 million. This drastically reduces the memory footprint
(1.7 MB vs 94 MB for VGG16), making it more practical for mobile or edge deployment.
● Inference Time: The proposed CNN processes images in 0.5 seconds, faster than VGG16, which
takes approximately 2 seconds per image.

Fig - Table comparing our model with existing


models

Therefore the drawbacks of existing models that we were able to overcome in our proposed model are -

● The model’s complexity (with over 9.4 million parameters) leads to slower inference and
requires more memory.
● Despite transfer learning, VGG16 achieves only 91.2% accuracy, significantly lower than the
custom CNN, due to its over-parameterization for this task.
4.3.2 Performance Analysis

The custom CNN outperforms VGG16 not only in accuracy (98.31% vs 91.2%) but also in
computational efficiency. Its smaller size and faster inference time make it ideal for deployment on
devices with limited computational resources, such as smartphones used by farmers or automated
agricultural systems.
Given the fast inference time (0.5 seconds) and high accuracy, the custom CNN is well-suited for
real-time plant disease detection applications in agriculture. VGG16, while accurate, is slower and more
resource-intensive, making it less suitable for real-time applications where speed is crucial.
The comparison demonstrates the clear advantage of the custom CNN over VGG16 for the task of plant
disease detection. The proposed model is more accurate, efficient, and lightweight, making it an ideal
solution for practical, real-time use in the agricultural sector.
CHAPTER 5

CONCLUSION AND FUTURE ENHANCEMENT

In this project, we developed a highly effective Convolutional Neural Network (CNN) for plant disease
detection using the PlantVillage dataset, achieving an impressive accuracy of 98.31%. This outcome
represents a significant leap in applying machine learning models in the agricultural domain, where
timely and accurate disease detection is critical to preventing crop loss and optimizing productivity.

The custom CNN model not only surpasses traditional architectures like VGG16, which achieved an
accuracy of 91.2%, but also demonstrates exceptional performance in terms of precision, recall, and
F1-score. With only 208,802 trainable parameters, our model stands out for its efficiency, both in terms
of computational resources and inference time. This makes it an ideal candidate for deployment in
real-time, mobile, and IoT-based agricultural solutions.

A key advantage of our CNN model is its lightweight architecture. By using fewer parameters, it
significantly reduces memory requirements, resulting in a smaller model size of just 1.7 MB compared to
VGG16's 94 MB. This reduction in model complexity translates into faster inference times, with the
custom CNN processing images in approximately 0.5 seconds, making it suitable for practical
applications in resource-constrained environments.

Our model addresses key challenges faced by farmers and agricultural stakeholders, such as identifying
diseases at early stages, enabling faster intervention, and providing a cost-effective solution that can be
deployed using existing mobile devices. This real-time plant disease detection tool can potentially
revolutionize precision agriculture by empowering farmers with AI-powered tools for effective crop
management, reducing the need for manual inspections, and ensuring that corrective actions are taken on
time.

Furthermore, this work emphasizes the importance of using domain-specific datasets, such as the
PlantVillage dataset, when building machine learning models for specialized applications. Our CNN
model's ability to outperform VGG16 and other transfer learning models underscores the effectiveness of
training a model specifically designed for the nuances of plant disease identification.

Future Enhancement:
Although CNNs have proven successful in plant disease detection, there are several areas for future
enhancement. First, developing more generalized models that can recognize diseases across different
crops, regions, and environmental conditions will be a significant advancement. This would require a
larger and more diverse dataset and further fine-tuning of model parameters. Additionally, integrating
real-time detection systems using mobile applications and edge computing could make this technology
more accessible to farmers worldwide. Future work can also focus on improving the interpretability of
deep learning models through techniques like saliency mapping and attention mechanisms, helping users
understand how the model makes its decisions. Finally, merging CNN models with other AI techniques
such as reinforcement learning and transfer learning can lead to the creation of more adaptive and
intelligent systems that continually improve their performance over time as new data is introduced.
● Integration with Drones and IoT Devices: Real-time plant disease detection could be further
enhanced through integration with drones and IoT devices. By equipping drones with cameras and
our CNN model, large-scale farms can be monitored autonomously. Drones can scan fields,
process images on the fly, and detect diseases at an early stage without human intervention. IoT
sensors placed in fields could collect environmental data such as temperature and humidity, which
could be fed into the model to further improve disease prediction accuracy through environmental
correlations.
● Mobile Application for Real-time Detection: The small model size and efficient inference time
make it feasible to deploy this CNN model as part of a mobile application for real-time plant
disease detection. The app could allow users to take photos of plants directly from their
smartphones, with the model running locally on the device or through cloud-based processing.
The app could provide instant feedback on the detected disease, along with recommended actions
for treatment. Additionally, incorporating features like disease tracking and early warning systems
could enhance the app’s utility, alerting users when outbreaks of specific diseases are detected in
nearby regions.
● Environmental and Geospatial Data Integration: By integrating geospatial and environmental
data, the model’s disease detection capabilities can be improved. Satellite images and weather data
could be incorporated to predict disease outbreaks based on environmental factors. For instance,
high humidity and warm temperatures are conducive to fungal diseases, and incorporating such
data into the model could enhance prediction accuracy. This would provide a more holistic system
where both the symptoms of the disease and its likelihood based on environmental conditions are
considered.
● Expansion to More Crop Types and Diseases: While the current model is trained on a few plant
diseases, future iterations could incorporate a broader variety of crops and diseases. This would
require the expansion of the dataset to include diverse plant species and disease conditions. By
increasing the scope of the dataset, the model would become a versatile tool applicable across
different crops, improving its utility for farmers with varied agricultural needs.
REFERENCES

1. Brahimi, M., Boukhalfa, K., & Moussaoui, A. (2017). Deep learning for tomato diseases:
Classification and symptoms visualization. Applied Artificial Intelligence, 31(4), 299-315.

2. Mohanty, S. P., Hughes, D. P., & Salathé, M. (2016). Using deep learning for image-based plant disease
detection. Frontiers in Plant Science, 7, 1419.

3. Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., & Stefanovic, D. (2016). Deep neural
networks-based recognition of plant diseases by leaf image classification. Computational Intelligence and
Neuroscience, 2016, 3289801.

4. Ferentinos, K. P. (2018). Deep learning models for plant disease detection and diagnosis. Computers
and Electronics in Agriculture, 145, 311-318.

5. Amara, J., Bouaziz, B., & Algergawy, A. (2017). A deep learning-based approach for banana leaf
disease classification. In Proceedings of the 2017 International Conference on Advances in Image
Processing (pp. 392-397). ACM.

6. Lu, Y., Yi, S., Zeng, N., Liu, Y., & Zhang, Y. (2017). Identification of rice diseases using deep
convolutional neural networks. Neurocomputing, 267, 378-384.

7. Too, E. C., Yujian, L., Njuki, S., & Yingchun, L. (2019). A comparative study of fine-tuning deep
learning models for plant disease identification. Computers and Electronics in Agriculture, 161, 272-279.

8. Wang, G., Sun, Y., & Wang, J. (2017). Automatic image-based plant disease severity estimation using
deep learning. Computational Intelligence and Neuroscience, 2017, 2917536.

9. Zhang, S., Huang, W., & Zhang, C. (2018). Three-channel convolutional neural networks for vegetable
leaf disease recognition. Cognitive Systems Research, 53, 31-41.

10. Fuentes, A., Yoon, S., Kim, S. C., & Park, D. S. (2017). A robust deep-learning-based detector for
real-time tomato plant diseases and pests recognition. Sensors, 17(9), 2022.

11. Picon, A., Alvarez-Gila, A., Seitz, M., Ortiz-Barredo, A., Echazarra, J., & Johannes, A. (2019). Deep
convolutional neural networks for mobile capture device-based crop disease classification in the wild.
Computers and Electronics in Agriculture, 161, 280-290.
12. Arsenovic, M., Karanovic, M., Sladojevic, S., Anderla, A., & Stefanovic, D. (2019). Solving current
limitations of deep learning-based approaches for plant disease detection. Symmetry, 11(7), 939.

13. Zhang, K., Zhang, L., & Wang, L. (2019). Crop disease recognition based on convolutional neural
network. International Journal of Distributed Sensor Networks, 15(4), 1-9.

14. Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey. Computers
and Electronics in Agriculture, 147, 70-90.

15. Jadhav, A., Kale, A., & Mohite, P. (2021). Plant disease detection and classification using deep
learning: A review. Artificial Intelligence in Agriculture, 5, 1-10.

16. Xie, X., Ma, Y., Liu, B., He, J., Wang, H., & Zhang, Y. (2020). A deep-learning-based real-time
detector for grape leaf diseases using improved convolutional neural networks. Frontiers in Plant Science,
11, 751.

17. Zhang, S., Wu, X., & You, Z. (2019). Leaf image-based cucumber disease recognition using sparse
representation classification. Computers and Electronics in Agriculture, 162, 135-141.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy