0% found this document useful (0 votes)
10 views

rapport

The document outlines a machine learning project focused on developing automated approaches for plant disease detection and classification using Convolutional Neural Networks (CNNs) and other methods. It details the project's objectives, state of the art in plant disease detection, dataset collection, preprocessing, and the evaluation of various machine learning models, ultimately finding CNNs to be the most effective with a 91% accuracy rate. The project aims to provide farmers with reliable tools for early disease detection to enhance agricultural productivity.

Uploaded by

Darine Hammami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

rapport

The document outlines a machine learning project focused on developing automated approaches for plant disease detection and classification using Convolutional Neural Networks (CNNs) and other methods. It details the project's objectives, state of the art in plant disease detection, dataset collection, preprocessing, and the evaluation of various machine learning models, ultimately finding CNNs to be the most effective with a 91% accuracy rate. The project aims to provide farmers with reliable tools for early disease detection to enhance agricultural productivity.

Uploaded by

Darine Hammami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIVERSITY OF CARTHAGE

FACULTY OF SCIENCES OF BIZERTE


IT Department

Machine learning project


Professional Master Computer Engineering
Course: Systems, Networks and Virtualization Expert

Titled

Automated Approaches for Plant Disease Detection


and Classification

Présenté par:
Hammami Darine
Sourour Rahaf

Année universitaire : 2024 / 2025

Table des matières

1
General introduction.....................................................................................................................

Chapter I :....................................................................................................................................5
State of the Art............................................................................................................................5
Introduction.................................................................................................................................6
1.1 Previous Work.......................................................................................................................6
1.2 Functional Requirements......................................................................................................7
1.3 Technical Requirements........................................................................................................7
1.4 Method for Using CNN.........................................................................................................8
1.5 Development Environment.................................................................................................10
1.6 programming language:......................................................................................................10
Chapter II :................................................................................................................................12
Introduction...............................................................................................................................13
2.1 Data Source.........................................................................................................................13
2.2 Data Preprocessing..............................................................................................................13
2.3 Methods Tested...................................................................................................................14
2.4 Realization
14
2.5 Comparison of Models........................................................................................................18
Proposed Solutions for Disease.................................................................................................19
Conclusion................................................................................................................................21
Perspectives...............................................................................................................................22

2
Liste des figure

Figure 1 : Logo de jupyter ...........................................................................................................10


Figure 2 : Logo de Python ..........................................................................................................11
Figure 3 : Capture 1( Screenshot of the Python environment).....................................................15
Figure 4 : Capture 2(Code snippet and screenshot showing dataset loading and splitting).........15
Figure 5 : Capture 3(Plot of accuracy and loss during training).................................................16
Figure 6 : Capture 4(Confusion matrix for CNN predictions)....................................................16
Figure 7 : Result of the models KNN..........................................................................................17
Figure 8 : Result of the models Random Forest.........................................................................17

3
General introduction

Agriculture is a vital sector that directly impacts food security and economic stability.
However, the increasing prevalence of plant diseases poses a significant threat to crop yields
and quality, leading to financial losses for farmers and challenges in ensuring global food
supplies. Early and accurate detection of plant diseases is therefore crucial to mitigate these
risks and enable timely interventions.

Traditional methods of disease detection, such as manual inspection by experts, are often
time-consuming, expensive, and prone to errors due to human limitations. The advent of
artificial intelligence, particularly machine learning and computer vision, has revolutionized
this field by offering automated, efficient, and reliable solutions for plant disease detection.
These technologies allow for the rapid analysis of plant images, identifying symptoms of
diseases with high precision.

This project aims to develop a machine learning-based system for detecting plant diseases
using a labeled dataset of plant leaf images. By leveraging Convolutional Neural Networks
(CNNs) alongside other machine learning methods like K-Nearest Neighbors (KNN) and
Random Forest (RF), we assess the effectiveness of these models in accurately classifying
healthy and diseased leaves. Through a systematic approach involving data preprocessing,
model training, and performance evaluation, we identify the most suitable method for
practical agricultural applications.

In the first chapter, we will explore the state of the art in plant disease detection,
including an overview of previous work, functional and technical requirements, and the tools
and methods used for this project. In the second chapter, we will discuss the dataset
collection and preprocessing steps, followed by the realization and comparison of different
machine learning models to determine the most effective approach for disease detection.

4
Chapter I :

State of the Art

5
Introduction

Agriculture plays a pivotal role in ensuring food security and driving economic growth.
However, plant diseases significantly threaten crop productivity and quality, leading to
substantial economic losses and challenges in meeting global food demands. Accurate and
timely detection of plant diseases is crucial to mitigate these impacts and implement effective
solutions. Traditional methods of identifying plant diseases, such as manual inspection, are
labor-intensive, costly, and prone to errors, necessitating innovative approaches to address
these limitations.

Recent advancements in artificial intelligence (AI) and machine learning (ML),


particularly in deep learning and computer vision, have paved the way for automated plant
disease detection systems. These technologies leverage image analysis to identify disease
symptoms with high precision, offering a scalable and efficient alternative to traditional
methods. This project focuses on developing a machine learning-based system to detect
common plant diseases, particularly rusty and powdery mildew, using a dataset of labeled
plant leaf images.

By implementing and comparing different machine learning methods, including


Convolutional Neural Networks (CNNs), K-Nearest Neighbors (KNN), and Random Forest
(RF), this work evaluates the most effective solution for identifying plant diseases. The
project also proposes actionable solutions for managing identified diseases to support farmers
in maintaining healthy crops.

In the first chapter, we will delve into the state of the art in plant disease detection,
reviewing previous research, identifying requirements, and describing the tools and methods
employed in this study. The second chapter will cover dataset collection and preprocessing,
the development and evaluation of machine learning models, and proposed solutions for
mitigating plant diseases like rusty and powdery mildew.

6
1.1 Previous Work

The field of plant disease detection has seen remarkable progress with the advent of
machine learning, particularly deep learning models. Among these, CNNs have emerged as
the most practical approach due to their ability to extract intricate patterns from image data. A
notable study by Shoibam Amritraj et al., titled "An Automated and Fine-Tuned Image
Detection and Classification System for Plant Leaf Diseases," demonstrated how fine-tuned
CNN models achieve a balance between accuracy and real-time performance. These models
were tailored to detect specific diseases using custom datasets, making them highly suitable
for agricultural applications.

Another significant contribution by Sangeetha et al., in their work titled "A Novel Smart
Approach to Plant Health - Automated Detection and Diagnosis of Leaf Diseases," employed
a combination of CNN, ResNet, and YOLOv7-X architectures. Their system effectively
analyzed diverse datasets to identify diseases like rusty and powdery mildew. This hybrid
approach achieved an accuracy of 85%, showcasing the potential of deep learning for
precision agriculture.

1.2 Functional Requirements

These requirements describe what the system must accomplish to meet its objectives:

 Disease Identification: The model must be capable of accurately detecting and


classifying diseases present on plant leaves.
 User-Friendly Interface: An application or interface that allows users (farmers or
researchers) to input leaf images and receive clear and understandable results.
 Fast and Efficient Predictions: Results must be provided in near real-time to enable
quick action in the field.
 Updates and Scalability: The system should be upgradable with new data or new disease
classes to remain relevant.

1.3 Technical Requirements


These requirements concern the tools and technologies needed for implementation:

 Deep Learning Model:Use of CNN as the base model for object detection.

7
 Balanced and Representative Dataset:

Collect data from sources like KAGGLE or other academic databases.

Preprocessing to ensure class balance and optimal quality.

 Model Training Platform:Access to GPUs for efficient model training (jupyter or


similar infrastructure).

 Data Preprocessing:Techniques like cleaning, resizing, and data augmentation to


improve model performance.
 High Detection Accuracy: The model must achieve optimal accuracy in detecting plant
diseases.
 Scalability: The solution should be able to process a large number of images, ideally in
real-time.

1.4 Method for Using CNN

A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is


particularly well-suited for image recognition and processing tasks. It is made up of multiple
layers, including convolutional layers, pooling layers, and fully connected layers. The
architecture of CNNs is inspired by the visual processing in the human brain, and they are
well-suited for capturing hierarchical patterns and spatial dependencies within images.

The model was trained using a loss function based on cross-entropy for classification tasks,
with the Adam optimizer to adjust the network weights. A validation set was used to evaluate
the model’s performance during training.

Filters in CNNs:

 Definition: A filter (or kernel) is a small matrix (e.g., 3×33 \times 33×3, 5×55 \times
55×5) used to perform convolution operations on the input (image or feature map).
 Function:
o Extract specific features (edges, textures, patterns) from the input image.
o Each filter detects a particular type of pattern in the data.
 Learned Parameters: The filter values are learned during training through
backpropagation.

Layers in CNNs:

Layers structure the CNN architecture and utilize filters to transform the data at each stage.

8
a. Convolutional Layer:

 Role: Applies filters to the input.


 Number of Filters: A layer can have multiple filters (e.g., 32, 64, 128 filters).
o More filters mean the layer extracts a wider variety of features.
 Output: Each filter produces a feature map (a transformed version of the input), and
the layer combines these maps into an output.

b. Activation Layer:

 Role: Introduces non-linearity into the network (e.g., with a ReLU function).
 Connection to Filters: Activations are applied after convolution to emphasize
extracted features.

c. Pooling Layer:

 Role: Reduces the size of the feature maps generated by the filters (e.g., through
MaxPooling or AveragePooling).
 Effect: Helps retain important features while reducing computational complexity.

Relationship Between Layers and Filters:

1.Each Layer Uses Multiple Filters:

o The number of filters in a layer is a hyperparameter defined by the model


designer.
o Each filter operates independently on the input to produce a feature map.

2.Depth of the Output:

o If a layer has NNN filters, its output will have a depth (or number of channels) equal
to NNN.

3.Layers Build on Each Other:

o The output of one layer becomes the input for the next layer.
o Deeper layers capture more complex features based on the simpler features
extracted by earlier layers.

4.Filters are Refined During Training:

o Filters are adjusted during training to capture the most relevant features for the task
(e.g., classification, detection).

Advantages of CNN:

 Automatic Feature Extraction: They automatically learn and extract features from raw
images, eliminating manual feature engineering.

9
 Efficiency in Image Processing: CNNs process large images in smaller regions,
improving speed and efficiency.
 End-to-End Training: CNNs learn directly from input images to output labels,
improving accuracy.
 Scalability: CNNs scale well with large datasets, ideal for tasks like plant disease
detection.

Different Types of CNN Models


1. LeNet
2. AlexNet
3. ResNet
4. GoogleNet
5. MobileNet
6. VGG

1.5 Development Environment

Jupyter:

Project Jupyter is a project to develop open-source software, open standards, and services
for interactive computing across multiple programming language.(Project Jupyter -
Wikipedia)

Figure 1: Logo de jupyter

1.6 programming language:


Python:

Python is dynamically typed and garbage-collected. It supports


multiple programmingparadigms,including structured , object-oriented and functional
programming. It is often described as a "batteries included" language due to its
comprehensive standard library.

10
Figure 2 : Logo de Python

libraries :

TensorFlow:

TensorFlow is a software library for machine learning and artificial intelligence. It can be
used across a range of tasks, but is used mainly for training and inference of neural networks.
(TensorFlow - Wikipedia)

Keras:

Keras is an open-source library that provides a Python interface for artificial neural
networks. Keras was first independent software, then integrated into the TensorFlow library,
and later supporting more.(Keras - Wikipedia)

OpenCV:

OpenCV is a library of programming functions mainly for real-time computer vision for
image processing and deep learning model construction.(OpenCV - Wikipedia)

11
12
Chapter II :

Dataset Collection for the Project

13
Introduction

The success of any machine learning project relies heavily on the quality and diversity of
the dataset used. For this plant disease detection project, the dataset serves as the foundation
for training, validating, and testing the models. By carefully selecting and processing the data,
we ensure that the machine learning algorithms can effectively learn patterns and make
accurate predictions.

The dataset, sourced from Kaggle, comprises 1,570 labeled images of plant leaves. These
images represent a range of healthy and diseased conditions across various plant species.
Proper organization of the data into training, validation, and test sets enables a systematic
approach to model development and performance evaluation.

In this chapter, we discuss the origins of the dataset, the preprocessing steps undertaken to
prepare it for analysis, and the methods used to extract meaningful insights. Additionally, we
explore the importance of preprocessing techniques, such as normalization and data
augmentation, which play a crucial role in improving model accuracy and generalization.

Finally, we detail the implementation and testing of different machine learning methods,
culminating in a comparative analysis of their effectiveness.

2.1 Data Source

For this project, we used a dataset from Kaggle, a data science platform. The dataset
contains 1572 images of plant leaves, labeled as either healthy or diseased. These images are
divided into three sets:

Training Set: 1372 images used to train the model.

Validation Set: 150 images used to tune the model's hyperparameters.

Test Set: 60 images used to evaluate the model's final performance.

2.2 Data Preprocessing

14
Data preprocessing is a crucial step to ensure the model learns effectively. The
preprocessing steps included:

Resizing Images: All images were resized to a uniform size of 128x128 pixels to optimize
computation time.

Normalization: Image pixel values were normalized to a range of 0 to 1 to help speed up


training.

Data Augmentation: To improve generalization and prevent overfitting, data augmentation


techniques such as rotation, zoom, and brightness adjustments were applied.

2.3 Methods Tested

Three main methods were tested:

CNN (Convolutional Neural Network): The CNN model performed the best, achieving an
accuracy of 91% on the test set.

KNN (K-Nearest Neighbors): The KNN model, while effective for simpler datasets,
achieved only 57% accuracy.

Random Forest (RF): The Random Forest model performed moderately well, achieving
78% accuracy.

2.4 Realization

The implementation involved training three different models: CNN, KNN, and Random
Forest.

 CNN: The CNN model, consisting of convolutional, pooling, and fully connected layers,
achieved 91% accuracy on the test set.
 KNN: The KNN model achieved 57% accuracy, underperforming on the dataset due to
its inability to capture complex patterns.
 Random Forest: The RF model achieved 79% accuracy, showing better performance
than KNN but falling short of CNN.

15
Environment Setup

Capture 1: Screenshot of the Python environment (e.g., Jupyter Notebook) with necessary
libraries installed.

Figure 3: Capture 1( Screenshot of the Python environment)

Data Loading

Capture 2: Code snippet and screenshot showing dataset loading and splitting.

Figure 4: Capture 2(Code snippet and screenshot showing dataset loading and splitting)

16
Model Development

Capture 3: Plot of accuracy and loss during training.

Figure 5 : Capture 3(Plot of accuracy and loss during training)

Model Evaluation

Capture 4: Confusion matrix for CNN predictions.

17
Figure 6 : Capture 4(Confusion matrix for CNN predictions)

Comparison of Models

The results of the three models were compared using the following metrics:

Accuracy: The proportion of correct predictions.

Result of the models KNN

Figure 7 : Result of the models KNN

Result of the models Random Forest

18
Figure 8 : Result of the models Random Forest

2.5 Comparison of Models

Model
Type

CNN
KNN
Random Forest

Accuracy

Accuracy of 91%

Accuracy of 57%
Accuracy of 79%

Real-Time
Performance

19
high
Low
Medium
Hardware
Dependency

Requires GPU for training


Minima
l

Moderat
e

Accessibility

Medium (requires
training)

Hig
h

Hig
h

Practicality for Agricultural Applications


Suitable for field
applications

Not
practical

20
Moderately
practical

Advantages

High accuracy,
effective feature
extraction

Simple, easy
to implement

Interpretable,
moderate
computational
needs

Disadvantages

21
Computationally
expensive

Poor accuracy
on large,
complex
datasets

Less effective
with high-
dimensional
data

22
Proposed Solutions for Disease

Rusty Disease

Rusty disease is a fungal infection that causes reddish-brown spots on plant leaves, often
leading to leaf drop and stunted growth. Here are some solutions to manage rusty disease:

1. Remove Infected Leaves: Cut off and dispose of infected leaves to prevent the spread
of the disease.
2. Space Plants: Provide adequate spacing between plants to enhance airflow and reduce
humidity around the leaves.
3. Avoid Overhead Watering: Water the base of the plants instead of overhead to
reduce moisture on leaves, which promotes fungal growth.
4. Control Watering: Avoid overwatering the plants. Maintain the soil slightly moist
but not soggy.
5. Use Milk Solution: Mix one part whole milk with nine parts water and spray on the
affected leaves to control fungal growth.
6. Baking Soda Spray: Mix one tablespoon of baking soda, one tablespoon of liquid
soap, and one liter of water. Spray this mixture on leaves to fight the rust infection.
7. Apply Fungicide: Use a rust-specific fungicide for more severe cases to stop further
damage.

Powdery Mildew

Powdery mildew is a fungal infection characterized by white, powdery growth on leaves. It


can weaken plants and affect their growth. The following measures can help control powdery
mildew:

1. Remove Affected Leaves: Cut off and destroy infected leaves to reduce the spread of
spores.
2. Improve Air Circulation: Ensure proper spacing between plants to prevent humidity
buildup and improve airflow.
3. Use Fungicides: Apply sulfur-based fungicides or neem oil to the affected areas.
4. Maintain Proper Watering: Avoid direct water contact with leaves. Water at the
base to reduce the chance of infection.

23
Conclusion
The detection and diagnosis of plant diseases are crucial for ensuring agricultural
productivity and food security. This project demonstrated the potential of machine learning,
particularly Convolutional Neural Networks (CNNs), to provide efficient and accurate

24
solutions for identifying plant diseases. By leveraging a high-quality dataset from Kaggle and
employing preprocessing techniques such as normalization and data augmentation, we were
able to enhance the performance of our models.

The comparative analysis of the CNN, KNN, and Random Forest models revealed that
CNN significantly outperformed the others, achieving an accuracy of 91%. This success
highlights the superiority of deep learning for handling complex image-based tasks due to its
ability to automatically extract intricate features and learn spatial hierarchies.

While the results are promising, there are opportunities for further improvement.
Expanding the dataset, exploring advanced deep learning architectures, and integrating the
model into mobile or IoT platforms could enhance the system's scalability and accessibility
for real-world applications.

In conclusion, this project not only demonstrates the effectiveness of modern AI


techniques in addressing agricultural challenges but also provides a foundation for future
innovations in plant disease management, contributing to sustainable farming practices and
global food security.

25
Perspectives

While the current solution offers significant potential, there are several avenues to further
enhance its applicability and impact:

Integration with IoT Systems

 Developing a fully integrated system where IoT sensors and cameras continuously
monitor crops and provide real-time disease detection and analysis.

Incorporation of Weather and Soil Data

 Combining disease detection with other environmental factors such as weather and soil
data can improve decision-making for farmers by predicting disease outbreaks and
recommending preventative actions.

User-Friendly Applications

 Creating mobile or web-based applications with a simple interface where farmers can
upload leaf images and receive instant disease diagnoses and treatment suggestions.

26
27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy