Final Project Report
Final Project Report
Final Project Report
PROJECT REPORT
ON
BACHELOR OF TECHNOLOGY
IN
COMPUTER ENGINEERING
Under the guidance of
Last but not the least, we would also like to thank our friends and family who directly and
indirectly supported us during the project work and provided a helpful environment for our
project.
2|Page
CERTIFICATE
This is to certify that the project work entitled “CATTLE DISEASE PREDICTION BY
USING MACHINE LEARNING” which is submitted by:
is a record of a student's work carried by them in partial fulfilment of requirements for the
award of degree of Bachelor of Technology in the Department of Computer Engineering,
College of Technology, G.B. Pant University of Agriculture and Technology, Pantnagar.
3|Page
APPROVAL
This project report entitled “Cattle Disease Prediction by using Machine Learning” is
hereby approved as a creditable study of an engineering subject, as a prerequisite to the
degree for which it has been submitted.
Committee Members:
1. Dr. S.D. Samantaray
(Head of Department of Computer Engineering) ------------------------
2. Prof. Jalaj Sharma
(Assistant Professor) ------------------------
3. Prof. P.K. Mishra
(Assistant Professor) ------------------------
4.Dr. Rajeev Singh
(Associate Professor) ------------------------
5. Dr. C.S. Negi
(Assistant Professor) ------------------------
4|Page
ABSTRACT
Accurate and timely diagnosis of cattle diseases is essential for maintaining the health and
productivity of livestock, which is crucial for the agricultural economy and food security.
Traditional diagnostic methods, while effective, can be time-consuming, costly, and prone to
human error. This project explores the application of machine learning algorithms to predict
cattle diseases based on various symptoms, aiming to provide a reliable and efficient diagnostic
tool for farmers and veterinarians.
In this study, we developed and compared the performance of four machine learning
algorithms: Decision Tree, Naive Bayes, K-Nearest Neighbors (KNN), and Random Forest. A
comprehensive dataset of cattle health records, including symptoms and corresponding
diagnoses, was collected and preprocessed. Each model was trained and evaluated using this
dataset to identify the most effective algorithm for disease prediction.
The models were assessed based on their accuracy, precision, recall, and overall robustness.
Additionally, an interactive user interface was developed, allowing users to input observed
symptoms and receive immediate diagnostic predictions. This interface aims to facilitate
prompt and effective disease management by providing actionable insights in real-time.
The results demonstrate that machine learning can significantly enhance cattle health
monitoring systems, offering a scalable and cost-effective solution to disease diagnosis. Among
the algorithms tested, the Random Forest classifier showed the highest accuracy and
robustness, making it the most suitable for practical implementation.
This project highlights the potential of integrating machine learning into livestock health
management, paving the way for more sustainable and efficient agricultural practices. Future
work will focus on expanding the dataset, exploring additional advanced algorithms, and
integrating real-time data collection for continuous model improvement.
5|Page
TABLE OF CONTENTS
1. Introduction....................................................................................................... 08
1.1. Cattle disease Prediction System…....................................... 08
1.2. Machine learning algorithm available for image classification ............. 09
1.3. Project objectives and significance…………………………….. 09
1.4. Advantages of Machine Learning Based Cattle Disease Prediction ...... 10
2. Literature Review.............................................................................................. 11
3. Problem Specification....................................................................................... 13
3.1. Objective ................................................................................................. 13
3.2. Scope....................................................... .................................................. 13
3.3. Problem Statement..................................................................................... 13
3.4. Expected Outcomes................................................................................... 13
3.5. Significance 14
4. Requirement Analysis....................................................................................... 15
4.1. Hardware Configuration........................................................................... 15
4.2. Software Configuration............................................................................ 15
4.3. Functional Requirements.......................................................................... 15
4.4. Non-Functional Requirements................................................................. 16
5. System Design.................................................................................................... 18
5.1. Acquiring Dataset..................................................................................... 18
5.2. Data Preprocessing…………………………………………………….. 19
5.3. Training the Machine Learning Model.................................................... 19
5.4. Model Development………………………………………………....... 19
5.5. Developing a User Interface.................................................................... 20
6. Implementation.................................................................................................. 21
6.1. Tools and Technology..................................................................................... 23
8. Summary............................................................................................................ 40
9. Literature Cited................................................................................................. 41
APPENDIX I.......................................................................................................... 44
APPENDIX II........................................................................................................ 45
APPENDIX III....................................................................................................... 58
6|Page
LIST OF FIGURES
Figure-1: Flow Diagram of the Project................................................................... 10
Figure-2: Symptoms Observations in Cattle .......................................................... 14
Figure-3: Working Steps.......................................................................................... 18
Figure-4: Decision Tree ......................................................................................... 25
Figure-5 : Random Forest Algorithm..................................................................... 26
Figure-6: KNN Classification Algorithm................................................................ 27
Figure-7: Naïve Byes Algorithm…………………………………………………. 28
Figure-8: Python Installer Window......................................................................... 31
Figure-9: Python version check............................................................................... 31
Figure -10: Installing pip......................................................................................... 32
Figure -11: Installing sklearn library....................................................................... 32
Figure -12: Installation of Numpy Library.............................................................. 32
Figure -13: Installation Pandas Library.............................................................. 33
Figure -14: Installation of Tkinter Library.............................................................. 33
Figure- 15: User Interface of Project..................................................................... 33
Figure- 16: User Interface of Project Prediction using Decision Tree.................. 34
Figure- 17: UI of Project Prediction Using Decision Tree & Random Forest….. 34
Figure- 18: UI of Project Prediction Using 4 Algorithms………………………. 35
Figure-19: Testing Dataset...................................................................................... 35
Figure-20: Training Dataset.................................................................................... 36
Figure-21: Scatter Plot for listeroisis...................................................................... 36
Figure-22: Scatter Plot for Coccidiosis................................................................... 37
Figure-23: Confusion Matrix.................................................................................. 37
Figure-24: List of Diseases..................................................................................... 38
Figure-25: List of Symptoms.................................................................................. 38
Figure-26: Backend DataBase................................................................................. 38
7|Page
CHAPTER 1 INTRODUCTION
8|Page
1.2. MACHINE LEARNING ALGORITHM AVAILABLE FOR DISEASE
PREDICTION
In this project on cattle disease prediction, we employed four distinct machine learning
algorithms: Decision Tree, Naive Bayes, K-Nearest Neighbours (KNN), and Random
Forest. Each of these algorithms has unique characteristics and strengths that make them
suitable for different aspects of predictive modelling. Below is an introduction to each of
these algorithms, outlining their fundamental principles and their applicability to disease
prediction.
➢ Decision Tree
A Decision Tree is a supervised learning algorithm used for classification and
regression tasks. It works by recursively splitting the data into subsets based on the
value of a selected feature. The decision tree model resembles a tree structure, where
each internal node represents a "decision" on an attribute, each branch represents the
outcome of the decision, and each leaf node represents a class label or continuous value.
➢ Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming strong
(naive) independence between features. Despite the simplification, it performs well in
many real-world situations, especially with high-dimensional data.
➢ K-Nearest Neighbour (KNN)
K-Nearest Neighbour is a simple, instance-based learning algorithm used for
classification and regression. It classifies a data point based on the majority class
among its 'k' nearest neighbors in the feature space.
➢ Random Forest
Random Forest is an ensemble learning method that constructs a multitude of
decision trees during training and outputs the mode of the classes for classification
or the mean prediction for regression. It combines the strengths of multiple decision
trees to improve accuracy and control overfitting.
9|Page
Integrating machine learning into cattle health management has several significant benefits:
➢ Improved Diagnostic Accuracy: ML models can analyze vast amounts of data to
identify subtle patterns that may be missed by human observation.
➢ Time and Cost Efficiency: Automated diagnosis reduces the need for extensive
veterinary consultations, saving time and resources.
➢ Scalability: ML models can be easily scaled to accommodate large herds, providing
consistent diagnostic support across different farms.
➢ Early Detection and Prevention: Predictive models can identify potential health
issues early, allowing for prompt intervention and reducing the spread of diseases.
10 | P a g e
CHAPTER 2 LITeRaTURe RevIew
11 | P a g e
6. Mastitis in dairy cows
Jones and Clark (2019): Applied Random Forest to predict mastitis in dairy cows. The
study incorporated genomic data and environmental factors, achieving an accuracy of
92%. The model’s feature importance analysis highlighted significant predictors, aiding
in early intervention strategies.
7. Efficiency of k-Nearest Neighbors in Large-scale Data for Cattle Disease Prediction
Huang et al. (2021): This paper evaluates the performance of k-Nearest Neighbors (k-
NN) in predicting cattle diseases using large-scale datasets. The study focuses on
optimizing the k value and distance metrics for better performance.
k-NN showed decreased efficiency and accuracy on larger datasets, highlighting the
need for optimization. The study suggested combining k-NN with feature selection
techniques to improve performance.
Rahman et al. (2018): This research applies Naive Bayes (NB) to predict lameness in
cattle based on behavioral and physiological data. NB was chosen for its simplicity and
ability to handle probabilistic relationships. NB performed well, especially when
combined with feature selection techniques. The model was computationally efficient
and easy to implement.
12 | P a g e
CHAPTER 3 pRObLem speCIfICaTION
In the agricultural sector, cattle health is critical for productivity and sustainability. Early
and accurate disease prediction can significantly reduce economic losses and improve animal
welfare. Traditional methods of diagnosing cattle diseases are often time-consuming and
require expertise, which may not be readily available in rural or resource-limited settings.
Leveraging machine learning algorithms offers a promising solution to predict cattle diseases
based on symptoms efficiently and accurately.
3.1.Objectives:
3.2.Scope:
This project will focus on predicting common cattle diseases based on a set of symptoms.
It will include:
3.3.Problem Statement:
Cattle diseases can have severe impacts on livestock productivity and farmer
livelihoods. Early and accurate diagnosis is crucial but often limited by the availability of
veterinary expertise, especially in rural areas. Machine learning algorithms can be employed
to predict cattle diseases based on observable symptoms, providing a valuable tool for early
intervention and treatment. However, the effectiveness of different algorithms in this context
needs to be systematically evaluated to identify the most suitable approach.
3.4.Expected Outcomes:
• A comparative analysis of KNN, Random Forest, Decision Tree, and Naive Bayes
algorithms for cattle disease prediction.
• Identification of the most effective machine learning model for this application.
13 | P a g e
• A functional web application that allows easy input of symptoms and provides accurate
disease predictions.
3.5.Significance:
This project will demonstrate the applicability of machine learning in the agricultural
domain, specifically for livestock health management. By providing a tool for early disease
detection, it can enhance cattle health, improve productivity, and support farmers in making
informed decisions. The comparative analysis of different algorithms will contribute to the
body of knowledge in veterinary informatics and machine learning.
14 | P a g e
CHAPTER 4 ReqUIRemeNT aNaLysIs
4.1 Hardware Configuration
● Processor : Intel i3
● Hard disk : 512 GB
● Memory : 2 GB
● Graphic Memory: N/A
o Decision tree
o Naive Bayers
o K Nearest Neighbour
o Random Forest
Firstly, our group worked on the data collection and preprocessing module to efficiently
gather diverse symptom data from multiple sources and clean it, addressing missing values
and noise to ensure high-quality inputs. We then focused on the model training and
optimization component, supporting the implementation, training, and fine-tuning of the
machine learning algorithms (Decision Tree, Naive Bayes, Random Forest, K-Nearest
Neighbor) to achieve high predictive accuracy and mitigate overfitting.
15 | P a g e
To ensure effectiveness, we developed an model evaluation system, employing metrics
such as accuracy, precision, recall, to assess model performance and tackle imbalanced
data through resampling techniques. Our team also created a real-time prediction system
capable of processing incoming symptom data quickly, providing timely and accurate
disease predictions.
➢ Efficiency
The proposed Machine learning based system provides an accuracy of about 90%
to 97%, depending on the different algorithm used for classification problem,
proved to be much more efficient when using other models like regression.
➢ Reliability
The Machine Learning based model is reliable and provides results accurately up
to maximum of 97% of the time. The model was tested with four different machine
learning based problem on classification problem, but the Randon Forest algorithm
proved to be more reliable. Thus, we used that model also in our testing and
implementation.
➢ Maintainability
It is fair to say that a system that undergoes changes with time is better suited and
preferred over others. The system that we proposed has the capability to undergo
changes and bug fixes in the future.
➢ Portability
It is the usability of the same software in different environments. The
prerequirement of portability is the generalized abstraction between the application
logic and the system interfaces. The proposed system fulfils the portability
requirement. The system that we proposed can be used in different environments.
The model can be used on different platforms including linux, windows, mac etc.
16 | P a g e
➢ Usability
The system is designed for a user-friendly environment so that the administrator
and user can perform various tasks easily and in an effective way.
➢ Security
We understand that security is one of the most important aspects of system design.
The system thus prepared is secure and no exploitations of the software can be done
in known ways. In future, further security patches and bug fixes can help strengthen
the security and integrity of the system.
17 | P a g e
CHAPTER 5 sysTem DesIgN
The design of a cattle disease prediction system using machine learning involves several
integrated components, each contributing to the accurate, efficient, and scalable diagnosis of
cattle diseases. Below is a detailed description of the system design, accompanied by schematic
images to illustrate the architecture.
➢ Data Types:
• Symptom Data: Observations of symptoms such as fever, coughing, and lethargy.
• Environmental Data: Temperature, humidity, and other conditions affecting cattle
health.
• Physiological Data: Vital signs like heart rate and body temperature.
• Historical Data: Past disease occurrences and treatment outcomes.
18 | P a g e
5.2 DATA PREPROCESSING
There are few Steps which are involved in data pre-processing:
Data Cleaning: Removing noise, handling missing values, and correcting inconsistencies.
Feature Extraction: Identifying relevant features (symptoms, environmental factors)
critical for disease prediction.
Normalization: Scaling features to a common range to ensure uniformity.
Categorical Encoding: Converting categorical variables (e.g., breed) into numerical
values using techniques like one-hot encoding, indexing.
19 | P a g e
5.5 DEVELOPING A USER INTERFACE
The next step is to develop a User Interface that helps us to interact with the model and
fetch the results. UI can be developed using libraries like PyQt or Tkinter. The group
developed the UI using Tkinter which is a cross-platform GUI toolkit, implemented as a
Python plug-in library.
20 | P a g e
CHAPTER 6 ImpLemeNTaTION
The implementation phase of the project is the development of the designs produced
during the design phase. It is the stage in the project where the theoretical design is turned into
a working system and is giving confidence on the new system for the users that it will work
efficiently and effectively. It involves careful planning, investigation of the current system and
its constraints on implementation, design of methods to achieve the changeover, and evaluation
of change over methods. The implementation process begins with preparing a plan for the
implementation of the system. According to this plan, the activities are to be carried out,
discussions made regarding the equipment and resources and the additional equipment has to
be acquired to implement the new system. In a network backup system no additional resources
are needed. Implementation is the final and the most important phase. The system can be
implemented only after thorough testing is done and if it is found to be working according to
the specification. This method also offers the greatest security since the old system can take
over if the errors are found or inability to handle certain types of transactions while using the
new system.
• System Testing
It is the stage of implementation, which is aimed at ensuring that the system works
accurately and efficiently before live operation commences. Testing is the process of
executing the program with the intent of finding errors and missing operations and also
a complete verification to determine whether the objectives are met and the user
requirements are satisfied. The ultimate aim is quality assurance. Tests are carried out
and the results are compared with the expected document. In the case of erroneous
results, debugging is done. Using detailed testing strategies a test plan is carried out on
each module. The various tests performed are unit testing, integration testing and user
acceptance testing.
• Unit Testing
The software units in a system are modules and routines that are assembled and
integrated to perform a specific function. Unit testing focuses first on modules,
independently of one another, to locate errors. This enables, to detect errors in 22 coding
and logic that are contained within each module. This testing includes entering data and
ascertaining if the value matches to the type and size supported by the system. The
various controls are tested to ensure that each performs its action as required.
• Integration Testing
Data can be lost across any interface, one module can have an adverse effect on another,
sub functions when combined, may not produce the desired major functions. Integration
testing is a systematic testing to discover errors associated within the interface. The
objective is to take unit tested modules and build a program structure. All the modules
are combined and tested as a whole. Here the Server module and Client module options
21 | P a g e
are integrated and tested. This testing provides the assurance that the application is a
well-integrated functional unit with smooth transition of data.
• User Training
After the system is implemented successfully, training of the user is one of the most
important subtasks of the developer. For this purpose, user manuals are prepared and
handled over to the user to operate the developed system. Thus, the users are trained to
operate the developed system. Both the hardware and software securities are made to
run the developed systems successfully in future. In order to put new application system
into use, the following activities were taken care of: -
a. Preparation of user and system documentation.
b. Conducting user training with demo and hands on.
c. Test run for some period to ensure smooth switching over the system
The users are trained to use the newly developed functions. User manuals describing
the procedures for using the functions listed on the menu are circulated to all the users.
It is confirmed that the system is implemented up to users' needs and expectations.
22 | P a g e
6.1 TOOLS AND TECHNOLOGY
Python is a dynamic, high level, free open source and interpreted programming language. It
supports object-oriented programming as well as procedural oriented programming.
In Python, we don’t need to declare the type of variable because it is a dynamically
typed language. For example, x = 10 Here, x can be anything such as String, int, etc.
a. Easy to code: Python is a high-level programming language. Python is very easy to
learn the language as compared to other languages like C, C#, Javascript, Java, etc.
It is very easy to code in python language and anybody can learn python basics in a
few hours or days. It is also a developer-friendly language.
b. Free and Open Source: Python language is freely available at the official website and
you can download it from the given download link below click on the Download
Python keyword. Download Python Since it is open-source, this means that source
code is also available to the public. So, you can download it, use it as well as share
it.
c. Object-Oriented Language: One of the key features of python is Object-Oriented
programming. Python supports object-oriented language and concepts of classes,
object encapsulation, etc.
d. GUI Programming Support: Graphical User interfaces can be made using a module
such as PyQt5, PyQt4, wxPython, or Tk in python. PyQt5 is the most popular option
for creating graphical apps with Python.
e. High-Level Language: Python is a high-level language. When we write programs in
python, we do not need to remember the system architecture, nor do we need to
manage the memory.
f. Extensible feature: Python is an Extensible language. We can write some Python
code into C or C++ language and also, we can compile that code in C/C++ language.
g. Python is Portable language: Python language is also a portable language. For
example, if we have python code for windows and if we want to run this code on
other platforms such as Linux, Unix, and Mac then we do not need to change it, we
can run this code on any platform.
Google Colaboratory or Colab for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing access free of charge to computing resources including GPUs.
Resources in Colab are prioritized for interactive use cases. We prohibit actions
associated with bulk computation, actions that negatively impact others, as well as
actions associated with bypassing our policies. The following are disallowed from
Colab runtimes:
a. File hosting, media serving, or other web service offerings not related to interactive
compute with Colab
23 | P a g e
b. Downloading torrents or engaging in peer-to-peer file-sharing
c. Using a remote desktop or SSH
d. Connecting to remote proxies
e. Mining cryptocurrency
f. Running denial-of-service attacks
g. Password cracking
h. Using multiple accounts to work around access or resource usage restriction
24 | P a g e
Figure-4 : Decision Tree
RandomForestClassifier: In machine learning, a Random Forest classifier is a powerful
ensemble method that leverages the combined strength of multiple decision trees. It's a
supervised learning technique used for classification tasks, where the goal is to predict
the class label of a new data point based on its features.
It works on following manner:
• Forest Creation: The algorithm builds a collection of decision trees, each trained on a
random subset of the training data (with replacement, a technique called bootstrapping).
This injects diversity into the forest, preventing overfitting.
• Random Feature Selection: At each node in a decision tree, instead of considering all
features for splitting, the algorithm randomly selects a subset of features (often the
square root of the total number). This further diversifies the trees and reduces their
dependence on any single feature.
• Tree Building: Each decision tree is built independently, recursively splitting the data
based on the chosen feature that best separates the classes. The split continues until a
stopping criterion (e.g., maximum depth, minimum samples per leaf) is met.
• Prediction: When presented with a new data point, the forest makes a prediction by
aggregating the predictions from all individual trees. This is typically done through
majority voting (for classification) or averaging (for regression).
Key Characteristics:
• Robustness: Reduces overfitting by averaging multiple trees.
• Accuracy: Generally, more accurate than individual decision trees due to the
ensemble approach.
• Feature Importance: Provides estimates of feature importance, helping to
identify the most relevant symptoms for disease prediction.
25 | P a g e
• Applicability: In cattle disease prediction, Random Forest can handle large
and complex datasets with many features, providing reliable and robust
predictions. Its ability to evaluate feature importance is particularly useful
for identifying the most critical symptoms influencing disease outcomes.
26 | P a g e
• Simplicity: Easy to understand and implement.
• Computational Cost: Can be computationally expensive for large datasets as
it requires calculating the distance between data points.
• Applicability: KNN is useful for disease prediction when similar symptom
patterns are expected to result in the same diagnosis. Its simplicity and
effectiveness in capturing local structure in the data make it a valuable tool for
initial exploratory analysis.
GaussianNB: GNB is a classification algorithm from the Naive Bayes family, implemented
in the scikit-learn library. It assumes that features in your data follow a Gaussian
(normal) distribution within each class. Based on this assumption, it predicts the class
probability for a new data point using Bayes' theorem.
Key Characteristics:
• Simplicity: Easy to implement and computationally efficient.
• Assumption of Independence: Assumes that the presence of a particular feature is
independent of the presence of any other feature.
• Scalability: Works well with large datasets and can handle a large number of features.
• Applicability: Naive Bayes is effective for disease prediction when the symptoms
(features) are conditionally independent given the disease. It is particularly useful when
rapid prediction is required with minimal computational resources.
27 | P a g e
Strengths:
• Fast training and prediction times.
• Effective for text classification and other problems with continuous features.
• Handles missing values well.
Weaknesses:
• The assumption of Gaussian distribution might not hold true for all datasets, potentially
affecting accuracy.
• May not perform as well as other algorithms for complex datasets.
Usage : from sklearn.naive_bayes import GaussianNB
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using Numpy
which allows NumPy to seamlessly and speedily integrate with a wide variety of
databases.
Pandas : Pandas is a free, open-source Python library specifically designed for data
manipulation and analysis. It provides high-performance, easy-to-use data structures and
data analysis tools, making it an essential component of the Python data science ecosystem.
28 | P a g e
Key Features of Pandas:
• DataFrames: The core data structure in Pandas is the DataFrame, a two-dimensional,
size-mutable, tabular data structure with labeled axes (rows and columns). It's
analogous to a spreadsheet but offers powerful functionalities for data handling.
• Series: One-dimensional arrays (like lists) can be created using Series, which are also
labeled and capable of holding any data type.
• Data Cleaning and Manipulation: Pandas excels at cleaning messy data. It offers
features for handling missing values, filtering data, selecting specific columns or rows,
sorting, merging DataFrames, and more.
• Data Analysis: Powerful statistical functions are available for calculating descriptive
statistics (mean, median, standard deviation, etc.), groupby operations for analyzing
subsets of data, and time series analysis for data with a time component.
Usage of Pandas :
• Loading Data: Pandas can efficiently load data from diverse file formats into
DataFrames or Series, simplifying data ingestion.
• Data Cleaning: Missing values can be imputed or removed, outliers can be detected and
handled, and data can be formatted for consistency.
• Data Exploration: Descriptive statistics can be computed to gain insights into the data's
distribution and central tendency.
• Data Transformation: New columns can be created based on existing ones, data can be
grouped and aggregated for analysis, and data can be reshaped or pivoted for different
views.
• Data Visualization: Basic data visualizations can be created using Pandas' plotting
capabilities, and it can be used to prepare data for more advanced visualizations with
other libraries.
• Feature Engineering: Pandas plays a crucial role in preparing data for machine learning
models by creating new features, scaling features, and handling categorical data.
The OS module in Python provides functions for interacting with the operating system. OS
comes under Python’s standard utility modules. This module provides a portable way of
using operating system-dependent functionality. The os and os.path modules include
many functions to interact with the file system.
Matplotlib : Matplotlib is a fundamental and versatile library in Python for creating static,
animated, and interactive visualizations of data. It empowers you to transform numerical
information into clear and insightful graphical representations, making data exploration,
analysis, and communication more effective.
Features and Capabilities:
• Extensive Plot Types: Matplotlib offers a wide array of plot types to suit your data and
communication needs:
1. Line plots (ideal for trends and time series)
2. Scatter plots (effective for exploring relationships between variables)
3. Bar charts (useful for comparing categories)
29 | P a g e
4. Histograms (great for visualizing data distribution)
5. Pie charts (suitable for depicting parts of a whole)
6. Box plots (excellent for summarizing distributions with outliers)
7. Heatmaps (powerful for visualizing matrices)
8. 3D plots (allow for visualization in a three-dimensional space)
9. Many more!
• Customization: Matplotlib provides extensive control over every aspect of your plots.
You can customize:
1. Line styles, colors, and markers
2. Axis labels, titles, and legends
3. Tick marks, grid lines, and background colors
4. Text properties (fonts, sizes, colors)
5. Layouts and subplots
• Integration with NumPy and Pandas: Matplotlib seamlessly integrates with NumPy
arrays and Pandas DataFrames, the workhorses of scientific computing and data
analysis in Python. This smooth integration allows you to leverage these data structures
directly for plotting without the need for conversion.
• Animation Capabilities: Matplotlib offers tools for creating animations, allowing you
to visualize how data changes over time or across parameters.
Tkinter Library: Tkinter is a Python library that allows you to create graphical user
interfaces (GUIs) for your applications. Here's a breakdown of what it is and how you can use
it:
Tkinter is the standard Python interface to the Tk (or Tcl/Tk) GUI toolkit. Tk is a cross-platform
library that's available on most operating systems, making your Tkinter applications usable on
Windows, macOS, and Linux.
It comes built-in with Python, so you don't need to install any additional libraries to use it.
Usage :
Tkinter provides various widgets, which are the building blocks of your GUI. These widgets
include buttons, labels, text boxes, menus, and more.
You can arrange these widgets within a window and define their behavior to create interactive
applications. For instance, you can create a button that triggers a specific function when
clicked.
Code : import tkinter as tk
30 | P a g e
CHAPTER 7 ResULTs & DIsCUssION
31 | P a g e
Installing pip
https://pip.pypa.io/en/stable/installation/
32 | P a g e
3. Installing pandas library
Command: > pip install pandas
33 | P a g e
Figure-16: User Interface of Project Prediction using Decision Tree
Figure-17: User Interface of Project Prediction using Decision Tree & Random
Forest
34 | P a g e
Figure-18: User Interface of Project Prediction using Decision Tree, Random Forest,
Naïve Bayes & K-Nearest Neighbour
35 | P a g e
7.2.3 Training Dataset
36 | P a g e
7.2.5. Scatter Plot for disease Coccidiosis
37 | P a g e
7.2.7. Disease for which Model is trained
38 | P a g e
7.3 OPPORTUNITIES
• Early Detection: Machine learning models can analyze data from various sources, like
temperature sensors, activity trackers, and visual analysis of lesions, to identify subtle
changes that might indicate an early stage of disease. This allows for faster intervention
and treatment, improving animal welfare and reducing potential losses.
• Improved Accuracy: Machine learning algorithms can be trained on vast datasets of
cattle health information and disease symptoms. This allows them to identify patterns
and relationships between symptoms and specific diseases with high accuracy,
potentially exceeding traditional methods relying solely on visual inspection.
• Reduced Costs: Early disease detection and treatment can minimize the need for
expensive medication and veterinary care in later stages. Additionally, the ability to
isolate infected animals quickly can prevent the spread of disease within the herd,
reducing overall economic losses.
• Scalability and Automation: Machine learning models can be deployed on farms of
all sizes. Data collection and analysis can be automated, reducing the workload on
farmers and veterinarians. This allows for more frequent monitoring of cattle health and
faster response times.
• Data-Driven Insights: Machine learning can analyze large datasets to identify trends
and patterns in disease outbreaks. This information can be invaluable for researchers
and veterinarians, helping them understand disease transmission patterns, develop
preventative measures, and improve overall herd health management strategies.
7.4 CHALLENGES
39 | P a g e
CHAPTER 8 sUmmaRy
In our project, we explored the intersection of data science and veterinary medicine to
predict diseases in cattle. By analyzing symptom patterns, we aimed to provide early
detection and personalized interventions for livestock health.
Methodology and Data Collection: We assembled a diverse dataset comprising cattle health
records, symptoms, and corresponding disease labels. Feature engineering involved selecting
relevant symptoms and encoding them effectively.
Model Selection and Training: We experimented with various machine learning algorithms,
including decision trees, random forests, Naive Bayes and K nearest Neighbor. The dataset
was split into training and testing sets. Our best-performing model achieved an accuracy
of 97% on the data.
The final step involves deployment of the best-performing model into a user-friendly web-
based or mobile application. This application is designed for ease of symptom input and result
interpretation, enabling real-time predictions and immediate diagnostic assistance.
40 | P a g e
CHAPTER 9 LITeRaTURe CITeD
• https://www.kaggle.com/datasets/ashtired11/cattle-diseases
• https://www.irjmets.com/uploadedfiles/paper//issue_12_december_2023/47014/final/f
in_irjmets1702109350.pdf
• https://www.python.org/downloads/
• https://www.researchgate.net/publication/343387698_Application_of_Machine_Learn
ing_in_Animal_Disease_Analysis_and_Prediction
• https://www.geeksforgeeks.org/machine-learning-algorithms/
41 | P a g e
VITAE
Durgesh Kumar
S/o Naresh Kumar
62, Jhabri, Pathri Forest Range,
Haridwar, Uttarakhand
Pin Code – 249404
Phone No- 7351181385
Email – dk.durgesh97kumar@gmail.com
42 | P a g e
VITAE
The author, PRASHANT RANA was born on 29-SEP-2001. He passed his High
School from Shri Guru Nanak Academy School, Nanakmatta, Udham Singh
Nagar (Uttarakhand) and Intermediate from G.S. Convent School , Sitarganj,
Udham Singh Nagar (Uttarakhand).He joined as an undergraduate student in
College of Technology, Govind Ballabh Pant University of Agriculture &
Technology, Pantnagar for programme Bachelor of Technology (Computer
Engineering).
43 | P a g e
VITAE
The author, DIVYANSHI NEOLIA was born on 6-Jan-2002. She passed her
High School from St. Theresa Sr. Sec. School, Kathgodam Haldwani Uttarakhand
and Intermediate from St. Theresa Sr.Sec. School Kathgodam Haldwani
Uttarakhand. She joined as an undergraduate student in College of Technology,
Govind Ballabh Pant University of Agriculture & Technology, Pantnagar for
programme Bachelor of Technology (Computer Engineering).
44 | P a g e
VITAE
The author, PRANSHU PANDE was born on 20-AUG-2002. He passed his High
School from St. Josephs’ College, Nainital (Uttarakhand) and Intermediate from
St. Josephs’ College, Nainital (Uttarakhand). He joined as an undergraduate
student in College of Technology, Govind Ballabh Pant University of
Agriculture & Technology, Pantnagar for programme Bachelor of Technology
(Computer Engineering).
Pranshu Pande
S/O Paryank Pande,
Hydel Colony, Mayvilla Compound,
Tallital, Nainital, Uttarakhand
Pin Code – 263001
Phone No- 7060016543
Email – pranshu2008pande@gmail.com
45 | P a g e
APPENDIX A
#Importing libraries and packages
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from tkinter import *
import numpy as np
import pandas as pd
import os
46 | P a g e
#Reading the Cattle training Dataset .csv file
df=pd.read_csv("training.csv")
DF= pd.read_csv('training.csv', index_col='prognosis')
#indexing of dataframe
df.replace({'prognosis':{'mastitis':0,'blackleg':1,'bloat':2,'coccidiosis':3,'cryptosporidiosi
s':4,
'displaced_abomasum':5,'gut_worms':6,'listeriosis':7,'liver_fluke':8,'necrotic_enteritis':9
,'peri_weaning_diarrhoea':10,
'rift_valley_fever':11,'rumen_acidosis':12,
'traumatic_reticulitis':13,'calf_diphtheria':14,'foot_rot':15,'foot_and_mouth':16,'ragwort
_poisoning':17,'wooden_tongue':18,'infectious_bovine_rhinotracheitis':19,
'acetonaemia':20,'fatty_liver_syndrome':21,'calf_pneumonia':22,'schmallen_berg_virus'
:23,'trypanosomosis':24,'fog_fever':25}},inplace=True)
DF.head()
df.head()
#plotPerColumnDistribution(df, 10, 5)
#plotScatterMatrix(df, 20, 10)
X= df[l1]
47 | P a g e
y = df[["prognosis"]]
print(y)
np.ravel(y)
print(X)
print(y)
X_test= tr[l1]
y_test = tr[["prognosis"]]
np.ravel(y_test)
print(X_test)
print(y_test)
#list1 = DF['prognosis'].unique()
def scatterplt(disea):
x = ((DF.loc[disea]).sum())#total sum of symptom reported for given disease
x.drop(x[x==0].index,inplace=True)#droping symptoms with values 0
print(x.values)
y = x.keys()#storing nameof symptoms in y
print(len(x))
print(len(y))
plt.title(disea)
plt.scatter(y,x.values)
plt.show()
def scatterinp(sym1,sym2,sym3,sym4,sym5):
x = [sym1,sym2,sym3,sym4,sym5]#storing input symptoms in y
y = [0,0,0,0,0]#creating and giving values to the input symptoms
if(sym1!='Select Here'):
48 | P a g e
y[0]=1
if(sym2!='Select Here'):
y[1]=1
if(sym3!='Select Here'):
y[2]=1
if(sym4!='Select Here'):
y[3]=1
if(sym5!='Select Here'):
y[4]=1
print(x)
print(y)
plt.scatter(x,y)
plt.show()
clf3 = tree.DecisionTreeClassifier()
clf3 = clf3.fit(X,y)
49 | P a g e
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf3.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred1.set(" ")
pred1.set(disease[a])
else:
pred1.set(" ")
pred1.set("Not Found")
import sqlite3
conn = sqlite3.connect('database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS DecisionTree(Name
StringVar,Symtom1 StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4
TEXT,Symtom5 TEXT,Disease StringVar)")
c.execute("INSERT INTO
DecisionTree(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
VALUES(?,?,?,?,?,?,?)",(NameEn.get(),Symptom1.get(),Symptom2.get(),Symptom3.get(),Sy
mptom4.get(),Symptom5.get(),pred1.get()))
conn.commit()
c.close()
conn.close()
scatterinp(Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get(
))
scatterplt(pred1.get())
50 | P a g e
if comp:
root.mainloop()
elif((Symptom1.get()=="Select Here") or (Symptom2.get()=="Select Here")):
pred1.set(" ")
sym=messagebox.askokcancel("System","Kindly Fill atleast first two Symptoms")
if sym:
root.mainloop()
else:
from sklearn.ensemble import RandomForestClassifier
clf4 = RandomForestClassifier(n_estimators=100)
clf4 = clf4.fit(X,np.ravel(y)
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
y_pred=clf4.predict(X_test)
print("Random Forest")
print("Accuracy")
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
print("Confusion matrix")
conf_matrix=confusion_matrix(y_test,y_pred)
print(conf_matrix)
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf4.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred2.set(" ")
pred2.set(disease[a])
else:
pred2.set(" ")
pred2.set("Not Found")
import sqlite3
51 | P a g e
conn = sqlite3.connect('database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS RandomForest(Name
StringVar,Symtom1 StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4
TEXT,Symtom5 TEXT,Disease StringVar)")
c.execute("INSERT INTO
RandomForest(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
VALUES(?,?,?,?,?,?,?)",(NameEn.get(),Symptom1.get(),Symptom2.get(),Symptom3.get(),Sy
mptom4.get(),Symptom5.get(),pred2.get()))
conn.commit()
c.close()
conn.close()
#printing scatter plot of disease predicted vs its symptoms
scatterplt(pred2.get())
52 | P a g e
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = knn.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred4.set(" ")
pred4.set(disease[a])
else:
pred4.set(" ")
pred4.set("Not Found")
import sqlite3
conn = sqlite3.connect('database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS KNearestNeighbour(Name
StringVar,Symtom1 StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4
TEXT,Symtom5 TEXT,Disease StringVar)")
c.execute("INSERT INTO
KNearestNeighbour(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
VALUES(?,?,?,?,?,?,?)",(NameEn.get(),Symptom1.get(),Symptom2.get(),Symptom3.get(),Sy
mptom4.get(),Symptom5.get(),pred4.get()))
conn.commit()
c.close()
conn.close()
scatterplt(pred4.get())
53 | P a g e
NAIVE BAYES ALGORITHM
pred3=StringVar()
def NaiveBayes():
if len(NameEn.get()) == 0:
pred1.set(" ")
comp=messagebox.askokcancel("System","Kindly Fill the Name")
if comp:
root.mainloop()
elif((Symptom1.get()=="Select Here") or (Symptom2.get()=="Select Here")):
pred1.set(" ")
sym=messagebox.askokcancel("System","Kindly Fill atleast first two Symptoms")
if sym:
root.mainloop()
else:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb=gnb.fit(X,np.ravel(y))
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = gnb.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
54 | P a g e
pred3.set(" ")
pred3.set(disease[a])
else:
pred3.set(" ")
pred3.set("Not Found")
import sqlite3
conn = sqlite3.connect('database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS NaiveBayes(Name StringVar,Symtom1
StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4 TEXT,Symtom5 TEXT,Disease
StringVar)")
c.execute("INSERT INTO
NaiveBayes(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
VALUES(?,?,?,?,?,?,?)",(NameEn.get(),Symptom1.get(),Symptom2.get(),Symptom3.get(),Sy
mptom4.get(),Symptom5.get(),pred3.get()))
conn.commit()
c.close()
conn.close()
#printing scatter plot of disease predicted vs its symptoms
scatterplt(pred3.get())
Symptom2 = StringVar()
Symptom2.set("Select Here")
Symptom3 = StringVar()
Symptom3.set("Select Here")
Symptom4 = StringVar()
Symptom4.set("Select Here")
Symptom5 = StringVar()
Symptom5.set("Select Here")
Name = StringVar()
prev_win=None
def Reset():
global prev_win
55 | P a g e
Symptom1.set("Select Here")
Symptom2.set("Select Here")
Symptom3.set("Select Here")
Symptom4.set("Select Here")
Symptom5.set("Select Here")
NameEn.delete(first=0,last=100)
pred1.set(" ")
pred2.set(" ")
pred3.set(" ")
pred4.set(" ")
try:
prev_win.destroy()
prev_win=None
except AttributeError:
Pass
from tkinter import messagebox
def Exit():
qExit=messagebox.askyesno("System","Do you want to exit the system")
if qExit:
root.destroy()
exit()
#Headings for the GUI written at the top of GUI
w2 = Label(root, justify=CENTER, text=" Cattle Disease Predictor ", fg="black",
bg="Whitesmoke")
w2.config(font=("Times",30,"bold italic"))
w2.grid(row=1, column=0, columnspan=2, padx=100)
#w2 = Label(root, justify=LEFT, text="PROJECT GROUP 5 ", fg="Pink", bg="Ivory")
w2.config(font=("Times",30,"bold italic"))
w2.grid(row=2, column=0, columnspan=2, padx=100)
#Label for the name
NameLb = Label(root, text="Cattle ID/Name ", fg="Black", bg="Whitesmoke")
NameLb.config(font=("Times",15,"bold italic"))
NameLb.grid(row=6, column=0, pady=15, sticky=W)
56 | P a g e
knnLb = Label(root, text="Naive Bayes", fg="Black", bg="lightgrey", width = 20)
knnLb.config(font=("Times",15,"bold italic"))
knnLb.grid(row=19, column=0, pady=10, sticky=W)
OPTIONS = sorted(l1)
S2 = OptionMenu(root, Symptom2,*OPTIONS)
S2.grid(row=8, column=1)
S3 = OptionMenu(root, Symptom3,*OPTIONS)
S3.grid(row=9, column=1)
S4 = OptionMenu(root, Symptom4,*OPTIONS)
S4.grid(row=10, column=1)
S5 = OptionMenu(root, Symptom5,*OPTIONS)
S5.grid(row=11, column=1)
#Buttons for predicting the disease using different algorithms
dst = Button(root, text="Prediction 1",
command=DecisionTree,bg="lightyellow",fg="Black")
dst.config(font=("Times",15,"bold italic"))
dst.grid(row=6, column=3,padx=10)
57 | P a g e
rs = Button(root,text="Reset Inputs",
command=Reset,bg="crimson",fg="White",width=15)
rs.config(font=("Times",15,"bold italic"))
rs.grid(row=10,column=3,padx=10)
t4=Label(root,font=("Times",15,"bold italic"),text="K-Nearest
Neighbour",height=1,bg="lightseagreen"
,width=40,fg="Black",textvariable=pred4,relief="sunken").grid(row=19, column=1,
padx=10)
58 | P a g e