Final Project Report

A
PROJECT REPORT
ON
CATTLE DISEASE PREDICTION BY USING MACHINE LEARNING
SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS OF THE DEGREE

OF
BACHELOR OF TECHNOLOGY
IN
COMPUTER ENGINEERING
Under the guidance of
Prof. JALAJ SHARMA

Submitted By:
Durgesh Kumar I.D. No. (56126)

Prashant Rana I.D. No. (56127)
Divyanshi Neolia I.D. No. (56280)
Pranshu Pande I.D. No. (56306)
Department of Computer Engineering, College of Technology,

Govind Ballabh Pant University of Agriculture and Technology,
Pantnagar-263145, India
June, 2024
ACKNOWLEDGEMENT
“CATTLE DISEASE PREDICTION BY USING MACHINE LEARNING” is not the work
of solely our team but it’s a result of combined efforts put in by many people. We would like
to take this opportunity to thank all the people who helped us to carry out this project.
First of all, we express our gratitude to our project guide, Prof. Jalaj Sharma Sir without whose
help the project would not have been possible. It was his mentorship that encouraged us to
expedite our project process and could complete it in time. His precious suggestions and
constructive guidance have been indispensable in the completion of this project work.
We would also like to thank Dr. S. D. Samant ray, Dr. Rajeev Singh, Dr. P.K. Mishra, Dr. C.S.
Negi, Prof. B.K. Singh and Dr. Sunita Jalal for providing all the help and infrastructure needed
for the successful completion of this project. They have supported us in this endeavour and
appreciated us in our efforts during our project.
Last but not the least, we would also like to thank our friends and family who directly and
indirectly supported us during the project work and provided a helpful environment for our
project.
Group Members I.D. Signature

Durgesh Kumar I.D. No. (56126) ………………….
Prashant Rana I.D. No. (56127) ………………….
Divyanshi Neolia I.D. No. (56280) ………………….
Pranshu Pande I.D. No. (56306) ………………….
2|Page
CERTIFICATE
This is to certify that the project work entitled “CATTLE DISEASE PREDICTION BY
USING MACHINE LEARNING” which is submitted by:
NAME ID. No.
Durgesh Kumar 56126
Prashant Rana 56127
Divyanshi Neolia 56280
Pranshu Pande 56306
is a record of a student's work carried by them in partial fulfilment of requirements for the
award of degree of Bachelor of Technology in the Department of Computer Engineering,
College of Technology, G.B. Pant University of Agriculture and Technology, Pantnagar.
DATE: (Prof. JALAJ SHARMA)

Project Guide

3|Page
APPROVAL
This project report entitled “Cattle Disease Prediction by using Machine Learning” is
hereby approved as a creditable study of an engineering subject, as a prerequisite to the
degree for which it has been submitted.
Committee Members:
1. Dr. S.D. Samantaray
(Head of Department of Computer Engineering) ------------------------
2. Prof. Jalaj Sharma
(Assistant Professor) ------------------------
3. Prof. P.K. Mishra
4.Dr. Rajeev Singh
(Associate Professor) ------------------------
5. Dr. C.S. Negi
Signature of Head of Department

4|Page
ABSTRACT
Accurate and timely diagnosis of cattle diseases is essential for maintaining the health and
productivity of livestock, which is crucial for the agricultural economy and food security.
Traditional diagnostic methods, while effective, can be time-consuming, costly, and prone to
human error. This project explores the application of machine learning algorithms to predict
cattle diseases based on various symptoms, aiming to provide a reliable and efficient diagnostic
tool for farmers and veterinarians.
In this study, we developed and compared the performance of four machine learning
algorithms: Decision Tree, Naive Bayes, K-Nearest Neighbors (KNN), and Random Forest. A
comprehensive dataset of cattle health records, including symptoms and corresponding
diagnoses, was collected and preprocessed. Each model was trained and evaluated using this
dataset to identify the most effective algorithm for disease prediction.
The models were assessed based on their accuracy, precision, recall, and overall robustness.
Additionally, an interactive user interface was developed, allowing users to input observed
symptoms and receive immediate diagnostic predictions. This interface aims to facilitate
prompt and effective disease management by providing actionable insights in real-time.
The results demonstrate that machine learning can significantly enhance cattle health
monitoring systems, offering a scalable and cost-effective solution to disease diagnosis. Among
the algorithms tested, the Random Forest classifier showed the highest accuracy and
robustness, making it the most suitable for practical implementation.
This project highlights the potential of integrating machine learning into livestock health
management, paving the way for more sustainable and efficient agricultural practices. Future
work will focus on expanding the dataset, exploring additional advanced algorithms, and
integrating real-time data collection for continuous model improvement.
5|Page
TABLE OF CONTENTS
1. Introduction....................................................................................................... 08
1.1. Cattle disease Prediction System…....................................... 08
1.2. Machine learning algorithm available for image classification ............. 09
1.3. Project objectives and significance…………………………….. 09
1.4. Advantages of Machine Learning Based Cattle Disease Prediction ...... 10
2. Literature Review.............................................................................................. 11
3. Problem Specification....................................................................................... 13
3.1. Objective ................................................................................................. 13
3.2. Scope....................................................... .................................................. 13
3.3. Problem Statement..................................................................................... 13
3.4. Expected Outcomes................................................................................... 13
3.5. Significance 14
4. Requirement Analysis....................................................................................... 15
4.1. Hardware Configuration........................................................................... 15
4.2. Software Configuration............................................................................ 15
4.3. Functional Requirements.......................................................................... 15
4.4. Non-Functional Requirements................................................................. 16
5. System Design.................................................................................................... 18
5.1. Acquiring Dataset..................................................................................... 18
5.2. Data Preprocessing…………………………………………………….. 19
5.3. Training the Machine Learning Model.................................................... 19
5.4. Model Development………………………………………………....... 19
5.5. Developing a User Interface.................................................................... 20
6. Implementation.................................................................................................. 21
6.1. Tools and Technology..................................................................................... 23
7. Results and Discussion...................................................................................... 31

7.1. Installation Process................................................................................... 31
7.2. Project Screenshots................................................................................... 33
7.3. Opportunities............................................................................................ 39
7.4. Challenges ……………………………………………………………. 39
8. Summary............................................................................................................ 40
9. Literature Cited................................................................................................. 41
10. Bio-Data of Students....................................................................................... 42
APPENDIX I.......................................................................................................... 44
APPENDIX II........................................................................................................ 45
APPENDIX III....................................................................................................... 58
6|Page
LIST OF FIGURES
Figure-1: Flow Diagram of the Project................................................................... 10
Figure-2: Symptoms Observations in Cattle .......................................................... 14
Figure-3: Working Steps.......................................................................................... 18
Figure-4: Decision Tree ......................................................................................... 25
Figure-5 : Random Forest Algorithm..................................................................... 26
Figure-6: KNN Classification Algorithm................................................................ 27
Figure-7: Naïve Byes Algorithm…………………………………………………. 28
Figure-8: Python Installer Window......................................................................... 31
Figure-9: Python version check............................................................................... 31
Figure -10: Installing pip......................................................................................... 32
Figure -11: Installing sklearn library....................................................................... 32
Figure -12: Installation of Numpy Library.............................................................. 32
Figure -13: Installation Pandas Library.............................................................. 33
Figure -14: Installation of Tkinter Library.............................................................. 33
Figure- 15: User Interface of Project..................................................................... 33
Figure- 16: User Interface of Project Prediction using Decision Tree.................. 34
Figure- 17: UI of Project Prediction Using Decision Tree & Random Forest….. 34
Figure- 18: UI of Project Prediction Using 4 Algorithms………………………. 35
Figure-19: Testing Dataset...................................................................................... 35
Figure-20: Training Dataset.................................................................................... 36
Figure-21: Scatter Plot for listeroisis...................................................................... 36
Figure-22: Scatter Plot for Coccidiosis................................................................... 37
Figure-23: Confusion Matrix.................................................................................. 37
Figure-24: List of Diseases..................................................................................... 38
Figure-25: List of Symptoms.................................................................................. 38
Figure-26: Backend DataBase................................................................................. 38
7|Page
CHAPTER 1 INTRODUCTION
Cattle disease prediction is a critical aspect of modern livestock management,

impacting both animal welfare and agricultural productivity. Traditionally, diagnosing diseases
in cattle has relied heavily on the expertise of veterinarians, who assess symptoms through
physical examination and laboratory tests. However, this approach is time-consuming, costly,
and may not always be feasible, particularly in remote or resource-limited areas. In response
to these challenges, the integration of machine learning algorithms presents a promising
solution. By leveraging algorithms such as Decision Tree, Naive Bayes, K-Nearest Neighbour
(KNN), and Random Forest, it becomes possible to analyse vast amounts of symptom data and
predict diseases with increased accuracy and efficiency. Decision Tree algorithms offer
interpretability, allowing for clear decision rules to be established based on symptom
observations. Naive Bayes classifiers, with their simplicity and efficiency, provide fast and
effective disease predictions, particularly suitable for real-time applications. KNN algorithms,
on the other hand, excel at capturing local data patterns and making accurate predictions based
on the similarity of cases. Lastly, Random Forest models combine the strengths of multiple
Decision Trees, offering robustness, accuracy, and insights into feature importance. By
integrating these machine learning algorithms into cattle disease prediction systems, we aim to
revolutionize livestock management, enabling early detection, timely intervention, and
improved health outcomes for cattle herds.
1.1. CATTLE DISEASE PREDICTION SYSTEM

The health and well-being of cattle have long been a critical concern for farmers and
veterinarians, impacting the agricultural economy and food supply chain. Traditional
methods of diagnosing cattle diseases primarily relied on the expertise of veterinarians who
would observe symptoms, conduct physical examinations, and perform laboratory tests.
While effective, these methods are time-consuming, costly, and often inaccessible to smaller
farms.
The early 2000s saw the initial integration of technology into livestock management. Digital
record-keeping became more prevalent, allowing for systematic data collection and analysis.
However, the use of advanced computational methods for disease prediction was still in its
nascent stages.
The mid-2000s marked the advent of machine learning applications in agriculture.
Researchers began to explore various algorithms for tasks such as crop yield prediction, soil
analysis, and pest detection. As computational power increased and data collection methods
improved, the potential for applying machine learning to livestock health became apparent.
8|Page
1.2. MACHINE LEARNING ALGORITHM AVAILABLE FOR DISEASE
PREDICTION
In this project on cattle disease prediction, we employed four distinct machine learning
algorithms: Decision Tree, Naive Bayes, K-Nearest Neighbours (KNN), and Random
Forest. Each of these algorithms has unique characteristics and strengths that make them
suitable for different aspects of predictive modelling. Below is an introduction to each of
these algorithms, outlining their fundamental principles and their applicability to disease
prediction.
➢ Decision Tree
A Decision Tree is a supervised learning algorithm used for classification and
regression tasks. It works by recursively splitting the data into subsets based on the
value of a selected feature. The decision tree model resembles a tree structure, where
each internal node represents a "decision" on an attribute, each branch represents the
outcome of the decision, and each leaf node represents a class label or continuous value.
➢ Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' Theorem, assuming strong
(naive) independence between features. Despite the simplification, it performs well in
many real-world situations, especially with high-dimensional data.
➢ K-Nearest Neighbour (KNN)
K-Nearest Neighbour is a simple, instance-based learning algorithm used for
classification and regression. It classifies a data point based on the majority class
among its 'k' nearest neighbors in the feature space.
➢ Random Forest
Random Forest is an ensemble learning method that constructs a multitude of
decision trees during training and outputs the mode of the classes for classification
or the mean prediction for regression. It combines the strengths of multiple decision
trees to improve accuracy and control overfitting.
1.3. PROJECT OBJECTIVES AND SIGNIFICANCE

The primary objective of this project is to develop a robust, accurate, and user-friendly
system for predicting cattle diseases using machine learning techniques. By leveraging
historical health records and symptom data, we aim to build models that can assist farmers
and veterinarians in making informed decisions, thereby improving the health management
of cattle herds.
9|Page
Integrating machine learning into cattle health management has several significant benefits:
➢ Improved Diagnostic Accuracy: ML models can analyze vast amounts of data to
identify subtle patterns that may be missed by human observation.
➢ Time and Cost Efficiency: Automated diagnosis reduces the need for extensive
veterinary consultations, saving time and resources.
➢ Scalability: ML models can be easily scaled to accommodate large herds, providing
consistent diagnostic support across different farms.
➢ Early Detection and Prevention: Predictive models can identify potential health
issues early, allowing for prompt intervention and reducing the spread of diseases.
1.4. ADVANTAGES OF MACHINE LEARNING BASED CATTLE

DISEASE PREDICTION
Implementing machine learning algorithms for cattle disease prediction based on various
symptoms offers several key advantages. These algorithms, including Decision Tree,
Naive Bayes, K-Nearest Neighbour (KNN), and Random Forest, enhance diagnostic
accuracy by analysing complex patterns in symptom data. This improves the precision and
reliability of disease diagnoses, crucial for timely intervention and treatment. Furthermore,
machine learning models streamline the diagnostic process, reducing the time and cost
associated with extensive veterinary consultations. Their scalability allows for consistent
diagnostic support across different farms and herds, ensuring widespread accessibility.
Additionally, these algorithms enable early detection and prevention of diseases,
facilitating prompt action to mitigate health issues before they escalate. Overall, the
integration of machine learning algorithms in cattle health management promises to
enhance efficiency, accuracy, and accessibility, ultimately improving the well-being and
productivity of livestock.
Figure-1: Flow Diagram of the Project
10 | P a g e
CHAPTER 2 LITeRaTURe RevIew
2.1 Key Studies and Methodologies

1. Prediction Models for Cattle Disease Using ML Techniques
Yan, Z., et al. (2021): This study focused on predicting bovine respiratory disease
(BRD) using several ML models, including Support Vector Machines (SVM), Random
Forest (RF), and Gradient Boosting Machines (GBM). The data included clinical signs,
environmental factors, and previous health records. GBM showed the highest accuracy,
emphasizing the importance of ensemble learning techniques in handling complex
veterinary data.
2. Time-Series Analysis for Early Disease Detection
Smith, J., & Wang, H. (2020): This research utilized recurrent neural networks (RNN)
and long short-term memory (LSTM) networks to predict the onset of mastitis in dairy
cattle. By analyzing time-series data from milking equipment and environmental
sensors, the models could identify patterns that precede clinical symptoms, enabling
earlier intervention.
3. Integration of Genomic Data for Disease Prediction
Kim, H., et al. (2019): The study explored the integration of genomic data with
traditional health records to predict susceptibility to Johne's disease in cattle. Using a
combination of convolutional neural networks (CNN) and RF, the researchers
demonstrated that incorporating genomic data significantly improves prediction
accuracy.
4. Sensor Data and IoT for Health Monitoring
Gonzalez, L.A., et al. (2018): This study leveraged Internet of Things (IoT) devices to
monitor cattle health in real-time. By employing ML algorithms on data collected from
wearable sensors (e.g., temperature, activity levels), the research highlighted the
potential of continuous monitoring and early disease detection. Decision trees and RF
were particularly effective in handling the heterogeneous data from IoT devices.
5. Ensemble Methods and Feature Selection
Patel, R., & Singh, K. (2019): The authors focused on feature selection techniques to
enhance the performance of ML models in predicting cattle diseases. They compared
various methods such as Recursive Feature Elimination (RFE) and Principal
Component Analysis (PCA) in combination with ensemble methods like RF and
AdaBoost. Their findings indicated that proper feature selection could substantially
reduce overfitting and improve model interpretability.
11 | P a g e
6. Mastitis in dairy cows
Jones and Clark (2019): Applied Random Forest to predict mastitis in dairy cows. The
study incorporated genomic data and environmental factors, achieving an accuracy of
92%. The model’s feature importance analysis highlighted significant predictors, aiding
in early intervention strategies.
7. Efficiency of k-Nearest Neighbors in Large-scale Data for Cattle Disease Prediction
Huang et al. (2021): This paper evaluates the performance of k-Nearest Neighbors (k-
NN) in predicting cattle diseases using large-scale datasets. The study focuses on
optimizing the k value and distance metrics for better performance.
k-NN showed decreased efficiency and accuracy on larger datasets, highlighting the
need for optimization. The study suggested combining k-NN with feature selection
techniques to improve performance.
8. Predicting Lameness in Cattle Using Naive Bayes
Rahman et al. (2018): This research applies Naive Bayes (NB) to predict lameness in
cattle based on behavioral and physiological data. NB was chosen for its simplicity and
ability to handle probabilistic relationships. NB performed well, especially when
combined with feature selection techniques. The model was computationally efficient
and easy to implement.
2.2 Findings and Discussions

• Model Performance: Ensemble models like RF and GBM generally outperform single
algorithms due to their ability to reduce variance and bias. Deep learning models such
as CNNs and LSTMs are particularly effective for complex data types like genomic and
time-series data.
• Data Integration: Combining different data sources, such as genomic, sensor, and
historical health records, enhances prediction accuracy. However, it also increases the
complexity of data preprocessing and model training.
• Early Detection: ML models, especially those analyzing real-time sensor data, can
predict diseases before clinical symptoms appear. This early detection is crucial for
preventing disease spread and improving treatment outcomes.
• Challenges: The main challenges include the need for large, high-quality datasets, the
complexity of integrating heterogeneous data, and the computational resources required
for training advanced ML models.
12 | P a g e
CHAPTER 3 pRObLem speCIfICaTION
In the agricultural sector, cattle health is critical for productivity and sustainability. Early
and accurate disease prediction can significantly reduce economic losses and improve animal
welfare. Traditional methods of diagnosing cattle diseases are often time-consuming and
require expertise, which may not be readily available in rural or resource-limited settings.
Leveraging machine learning algorithms offers a promising solution to predict cattle diseases
based on symptoms efficiently and accurately.
3.1.Objectives:
• To collect and preprocess a comprehensive dataset of cattle disease symptoms and

diagnoses.
• To implement and compare four machine learning algorithms: K-Nearest Neighbors
(KNN), Random Forest, Decision Tree, and Naive Bayes, for predicting cattle diseases.
• To evaluate the performance of each algorithm using metrics such as accuracy,
precision, recall, F1 score, and ROC-AUC.
• To develop a user-friendly interface for veterinarians and farmers to input symptoms
and receive disease predictions.
3.2.Scope:
This project will focus on predicting common cattle diseases based on a set of symptoms.
It will include:
• Data collection from veterinary records and agricultural databases.

• Data preprocessing, including handling missing values, normalization, and feature
encoding.
• Implementation of KNN, Random Forest, Decision Tree, and Naive Bayes algorithms.
• Evaluation and comparison of model performance.
• Development of a simple web application for disease prediction.
3.3.Problem Statement:
Cattle diseases can have severe impacts on livestock productivity and farmer
livelihoods. Early and accurate diagnosis is crucial but often limited by the availability of
veterinary expertise, especially in rural areas. Machine learning algorithms can be employed
to predict cattle diseases based on observable symptoms, providing a valuable tool for early
intervention and treatment. However, the effectiveness of different algorithms in this context
needs to be systematically evaluated to identify the most suitable approach.
3.4.Expected Outcomes:
• A comparative analysis of KNN, Random Forest, Decision Tree, and Naive Bayes
algorithms for cattle disease prediction.
• Identification of the most effective machine learning model for this application.
13 | P a g e
• A functional web application that allows easy input of symptoms and provides accurate
disease predictions.
3.5.Significance:
This project will demonstrate the applicability of machine learning in the agricultural
domain, specifically for livestock health management. By providing a tool for early disease
detection, it can enhance cattle health, improve productivity, and support farmers in making
informed decisions. The comparative analysis of different algorithms will contribute to the
body of knowledge in veterinary informatics and machine learning.
Figure-2 : Symptoms Observations in Cattle
14 | P a g e
CHAPTER 4 ReqUIRemeNT aNaLysIs
4.1 Hardware Configuration
● Processor : Intel i3
● Hard disk : 512 GB
● Memory : 2 GB
● Graphic Memory: N/A
4.2 Software Configuration

● Model : There are four different Machine Learning algorithms used:
o Decision tree
o Naive Bayers
o K Nearest Neighbour
o Random Forest
● Frameworks and Libraries

○ numpy
○ pandas
○ sklearn
○ os
○ mpl_toolkits
○ matplotlib
○ tkinter
○ DecisionTreeClassifier
○ RandomForestClassifier
○ KNeighborsClassifier
○ GaussianNB
● IDE: Google Colab, Python NoteBook and VSCode.
● Programming Language: Python
● Platform: Windows/Linux/MacOS
4.3 Functional Requirements

The functional requirements for our project report on cattle disease prediction using
machine learning based on various symptoms include several essential components.
Firstly, our group worked on the data collection and preprocessing module to efficiently
gather diverse symptom data from multiple sources and clean it, addressing missing values
and noise to ensure high-quality inputs. We then focused on the model training and
optimization component, supporting the implementation, training, and fine-tuning of the
machine learning algorithms (Decision Tree, Naive Bayes, Random Forest, K-Nearest
Neighbor) to achieve high predictive accuracy and mitigate overfitting.
15 | P a g e
To ensure effectiveness, we developed an model evaluation system, employing metrics
such as accuracy, precision, recall, to assess model performance and tackle imbalanced
data through resampling techniques. Our team also created a real-time prediction system
capable of processing incoming symptom data quickly, providing timely and accurate
disease predictions.
Additionally, we designed a user-friendly interface, enabling veterinarians and farmers to

input symptoms easily and receive predictions, ensuring accessibility for non-technical
users.
Finally, we formulated a scalability and deployment plan, leveraging cloud computing

to handle large datasets and real-time processing demands, ensuring the system's reliability
and efficiency in practical applications
4.4 Non-Functional Requirements

Non-functional requirements are requirements that are not directly concerned with the
specific functions delivered by the system. They may relate to emergent system properties
such as reliability, response time etc. They may specify system performance, security,
availability, and other emergent properties. This means that they are often more critical
than individual functional requirements. System users can usually find ways to work
around a system function that doesn’t really meet their needs. However, failing to meet a
non-functional requirement can mean that the whole system is unusable. Non-functional
requirements needed in Image Based Fruit Grading using deep learning are as follows:
➢ Efficiency
The proposed Machine learning based system provides an accuracy of about 90%
to 97%, depending on the different algorithm used for classification problem,
proved to be much more efficient when using other models like regression.
➢ Reliability
The Machine Learning based model is reliable and provides results accurately up
to maximum of 97% of the time. The model was tested with four different machine
learning based problem on classification problem, but the Randon Forest algorithm
proved to be more reliable. Thus, we used that model also in our testing and
implementation.
➢ Maintainability
It is fair to say that a system that undergoes changes with time is better suited and
preferred over others. The system that we proposed has the capability to undergo
changes and bug fixes in the future.
➢ Portability
It is the usability of the same software in different environments. The
prerequirement of portability is the generalized abstraction between the application
logic and the system interfaces. The proposed system fulfils the portability
requirement. The system that we proposed can be used in different environments.
The model can be used on different platforms including linux, windows, mac etc.
16 | P a g e
➢ Usability
The system is designed for a user-friendly environment so that the administrator
and user can perform various tasks easily and in an effective way.
➢ Security
We understand that security is one of the most important aspects of system design.
The system thus prepared is secure and no exploitations of the software can be done
in known ways. In future, further security patches and bug fixes can help strengthen
the security and integrity of the system.
17 | P a g e
CHAPTER 5 sysTem DesIgN
The design of a cattle disease prediction system using machine learning involves several
integrated components, each contributing to the accurate, efficient, and scalable diagnosis of
cattle diseases. Below is a detailed description of the system design, accompanied by schematic
images to illustrate the architecture.
Figure-3 : Working Steps
5.1 ACQUIRING DATASET

➢ Sources of Data:
• Veterinary Reports: Historical health records and diagnostic information.
• Farm Management Systems: Daily logs of cattle health and symptoms.\
• Genetic Data: Information on cattle breeds and genetic predispositions.
➢ Data Types:
• Symptom Data: Observations of symptoms such as fever, coughing, and lethargy.
• Environmental Data: Temperature, humidity, and other conditions affecting cattle
health.
• Physiological Data: Vital signs like heart rate and body temperature.
• Historical Data: Past disease occurrences and treatment outcomes.
18 | P a g e
5.2 DATA PREPROCESSING
There are few Steps which are involved in data pre-processing:
Data Cleaning: Removing noise, handling missing values, and correcting inconsistencies.
Feature Extraction: Identifying relevant features (symptoms, environmental factors)
critical for disease prediction.
Normalization: Scaling features to a common range to ensure uniformity.
Categorical Encoding: Converting categorical variables (e.g., breed) into numerical
values using techniques like one-hot encoding, indexing.
5.3 TRAINING THE MACHINE LEARNING MODEL

After the acquisition of the database, we make use of machine learning methods to develop
a model that is able to predict the disease based upon symptoms. We feed input data into
classification algorithm algorithms like Random Forest, KNN, Naïve Bayers and Decision Tree
to generate such a model. Training involves reading and manipulating the data so that the
accuracy of the model can be improved.
• Data Splitting: Dividing the dataset into training and testing sets.
• Model Development: Implementing and training each algorithm using the training set.
• Hyperparameter Tuning: Optimizing model parameters to enhance performance.
• Cross-Validation: Validating model performance using techniques like k-fold cross-
validation.
5.4 MODEL DEVELOPMENT

The model comprises of 3 different components namely,
• User Interface (UI): A web or mobile application allowing users to input symptoms
and receive predictions.
• Backend System: Servers hosting the machine learning models and handling
prediction requests.
• API Endpoints: Interfaces through which the UI communicates with the backend
system.
Then the model is deployed for processing using various steps:

• Model Integration: Embedding trained models into the backend system.
• Real-time Data Processing: Incorporating real-time data from symptoms given
by user.
• User Accessibility: Ensuring the interface is user-friendly and accessible to
farmers with varying technical expertise.
19 | P a g e
5.5 DEVELOPING A USER INTERFACE
The next step is to develop a User Interface that helps us to interact with the model and
fetch the results. UI can be developed using libraries like PyQt or Tkinter. The group
developed the UI using Tkinter which is a cross-platform GUI toolkit, implemented as a
Python plug-in library.
20 | P a g e
CHAPTER 6 ImpLemeNTaTION
The implementation phase of the project is the development of the designs produced
during the design phase. It is the stage in the project where the theoretical design is turned into
a working system and is giving confidence on the new system for the users that it will work
efficiently and effectively. It involves careful planning, investigation of the current system and
its constraints on implementation, design of methods to achieve the changeover, and evaluation
of change over methods. The implementation process begins with preparing a plan for the
implementation of the system. According to this plan, the activities are to be carried out,
discussions made regarding the equipment and resources and the additional equipment has to
be acquired to implement the new system. In a network backup system no additional resources
are needed. Implementation is the final and the most important phase. The system can be
implemented only after thorough testing is done and if it is found to be working according to
the specification. This method also offers the greatest security since the old system can take
over if the errors are found or inability to handle certain types of transactions while using the
new system.
• System Testing
It is the stage of implementation, which is aimed at ensuring that the system works
accurately and efficiently before live operation commences. Testing is the process of
executing the program with the intent of finding errors and missing operations and also
a complete verification to determine whether the objectives are met and the user
requirements are satisfied. The ultimate aim is quality assurance. Tests are carried out
and the results are compared with the expected document. In the case of erroneous
results, debugging is done. Using detailed testing strategies a test plan is carried out on
each module. The various tests performed are unit testing, integration testing and user
acceptance testing.
• Unit Testing
The software units in a system are modules and routines that are assembled and
integrated to perform a specific function. Unit testing focuses first on modules,
independently of one another, to locate errors. This enables, to detect errors in 22 coding
and logic that are contained within each module. This testing includes entering data and
ascertaining if the value matches to the type and size supported by the system. The
various controls are tested to ensure that each performs its action as required.
• Integration Testing
Data can be lost across any interface, one module can have an adverse effect on another,
sub functions when combined, may not produce the desired major functions. Integration
testing is a systematic testing to discover errors associated within the interface. The
objective is to take unit tested modules and build a program structure. All the modules
are combined and tested as a whole. Here the Server module and Client module options
21 | P a g e
are integrated and tested. This testing provides the assurance that the application is a
well-integrated functional unit with smooth transition of data.
• User Acceptance Testing

User acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with
the system users at time of developing and making changes whenever required.Apart
from planning, major tasks of preparing the implementation are education and training
of users. The most critical stage in achieving a successful new system is giving the users
confidence that the new system will work and be effective.
• User Training
After the system is implemented successfully, training of the user is one of the most
important subtasks of the developer. For this purpose, user manuals are prepared and
handled over to the user to operate the developed system. Thus, the users are trained to
operate the developed system. Both the hardware and software securities are made to
run the developed systems successfully in future. In order to put new application system
into use, the following activities were taken care of: -
a. Preparation of user and system documentation.
b. Conducting user training with demo and hands on.
c. Test run for some period to ensure smooth switching over the system
The users are trained to use the newly developed functions. User manuals describing
the procedures for using the functions listed on the menu are circulated to all the users.
It is confirmed that the system is implemented up to users' needs and expectations.
• Security and Maintenance

Maintenance involves the software industry captive, typing up system resources. It
means restoring something to its original condition. Maintenance follows conversion to
the extent that changes are necessary to maintain satisfactory operations relative to
changes in the user’s environment. Maintenance often includes minor enhancements or
corrections to problems that surface in the system’s operation. Maintenance is also done
based on fixing the problems reported, changing the interface with other software or
hardware, enhancing the software. Any system developed should be secured and
protected against possible hazards. Security measures are provided to prevent
unauthorized access of the database at various levels. Password protection and simple
procedures to prevent unauthorized access are provided to the user.
22 | P a g e
6.1 TOOLS AND TECHNOLOGY
Python is a dynamic, high level, free open source and interpreted programming language. It
supports object-oriented programming as well as procedural oriented programming.
In Python, we don’t need to declare the type of variable because it is a dynamically
typed language. For example, x = 10 Here, x can be anything such as String, int, etc.
a. Easy to code: Python is a high-level programming language. Python is very easy to
learn the language as compared to other languages like C, C#, Javascript, Java, etc.
It is very easy to code in python language and anybody can learn python basics in a
few hours or days. It is also a developer-friendly language.
b. Free and Open Source: Python language is freely available at the official website and
you can download it from the given download link below click on the Download
Python keyword. Download Python Since it is open-source, this means that source
code is also available to the public. So, you can download it, use it as well as share
it.
c. Object-Oriented Language: One of the key features of python is Object-Oriented
programming. Python supports object-oriented language and concepts of classes,
object encapsulation, etc.
d. GUI Programming Support: Graphical User interfaces can be made using a module
such as PyQt5, PyQt4, wxPython, or Tk in python. PyQt5 is the most popular option
for creating graphical apps with Python.
e. High-Level Language: Python is a high-level language. When we write programs in
python, we do not need to remember the system architecture, nor do we need to
manage the memory.
f. Extensible feature: Python is an Extensible language. We can write some Python
code into C or C++ language and also, we can compile that code in C/C++ language.
g. Python is Portable language: Python language is also a portable language. For
example, if we have python code for windows and if we want to run this code on
other platforms such as Linux, Unix, and Mac then we do not need to change it, we
can run this code on any platform.
Google Colaboratory or Colab for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education. More
technically, Colab is a hosted Jupyter notebook service that requires no setup to use,
while providing access free of charge to computing resources including GPUs.
Resources in Colab are prioritized for interactive use cases. We prohibit actions
associated with bulk computation, actions that negatively impact others, as well as
actions associated with bypassing our policies. The following are disallowed from
Colab runtimes:
a. File hosting, media serving, or other web service offerings not related to interactive
compute with Colab
23 | P a g e
b. Downloading torrents or engaging in peer-to-peer file-sharing
c. Using a remote desktop or SSH
d. Connecting to remote proxies
e. Mining cryptocurrency
f. Running denial-of-service attacks
g. Password cracking
h. Using multiple accounts to work around access or resource usage restriction
DecisionTreeClassifier: In Python's machine learning realm, DecisionTreeClassifier from

the scikit-learn library is a potent tool for classification tasks. It excels at creating a tree-
like structure that resembles a flowchart, where:
• Internal nodes represent features (attributes) from your data.expand_more Think of
these as questions you'd ask about the data points.
• Branches signify decision rules based on feature values.expand_more These are the
answers that lead you down different paths in the tree.
• Leaf nodes embody the final outcome (class labels) you're aiming to
predict.expand_more These are the destinations you reach after navigating the decision
rules.
Key Characteristics:
• Interpretability: Decision Trees are easy to understand and interpret, making
them suitable for applications where model transparency is crucial.
• Non-linearity: They can capture non-linear relationships between features and
the target variable.
• Overfitting: Prone to overfitting, especially with noisy data, but this can be
mitigated using techniques like pruning.
Applicability: In cattle disease prediction, Decision Trees can help identify key symptoms
and their combinations that lead to specific diseases, providing a clear diagnostic path
The magic lies in how the decision tree learns. It recursively partitions the data based on
features, meticulously choosing the one that most effectively divides the data into distinct
classes at each step.
Usage : from sklearn.tree import DecisionTreeClassifier
24 | P a g e
Figure-4 : Decision Tree
RandomForestClassifier: In machine learning, a Random Forest classifier is a powerful
ensemble method that leverages the combined strength of multiple decision trees. It's a
supervised learning technique used for classification tasks, where the goal is to predict
the class label of a new data point based on its features.
It works on following manner:
• Forest Creation: The algorithm builds a collection of decision trees, each trained on a
random subset of the training data (with replacement, a technique called bootstrapping).
This injects diversity into the forest, preventing overfitting.
• Random Feature Selection: At each node in a decision tree, instead of considering all
features for splitting, the algorithm randomly selects a subset of features (often the
square root of the total number). This further diversifies the trees and reduces their
dependence on any single feature.
• Tree Building: Each decision tree is built independently, recursively splitting the data
based on the chosen feature that best separates the classes. The split continues until a
stopping criterion (e.g., maximum depth, minimum samples per leaf) is met.
• Prediction: When presented with a new data point, the forest makes a prediction by
aggregating the predictions from all individual trees. This is typically done through
majority voting (for classification) or averaging (for regression).
• Robustness: Reduces overfitting by averaging multiple trees.
• Accuracy: Generally, more accurate than individual decision trees due to the
ensemble approach.
• Feature Importance: Provides estimates of feature importance, helping to
identify the most relevant symptoms for disease prediction.
25 | P a g e
• Applicability: In cattle disease prediction, Random Forest can handle large
and complex datasets with many features, providing reliable and robust
predictions. Its ability to evaluate feature importance is particularly useful
for identifying the most critical symptoms influencing disease outcomes.
Usage: from sklearn.ensemble import RandomForestClassifier
Figure-5 : Random Forest Algorithm
KNeighborsClassifier: In Python, KNeighborsClassifier is a class from the scikit-learn

library used for implementing the K-Nearest Neighbours (KNN) algorithm for
classification tasks.
K-Nearest Neighbors (KNN) Algorithm:
KNN is a non-parametric, supervised machine learning approach.
It works by storing all the training data and classifying a new data point based on the
labels of its closest neighbors (k neighbors).
The "closeness" is determined by a distance metric (e.g., Euclidean distance).
Usage: from sklearn.neighbors import KNeighborsClassifier
• Non-parametric: Makes no assumptions about the underlying data distribution.
26 | P a g e
• Simplicity: Easy to understand and implement.
• Computational Cost: Can be computationally expensive for large datasets as
it requires calculating the distance between data points.
• Applicability: KNN is useful for disease prediction when similar symptom
patterns are expected to result in the same diagnosis. Its simplicity and
effectiveness in capturing local structure in the data make it a valuable tool for
initial exploratory analysis.
Figure-6: KNN Classification Algorithm
GaussianNB: GNB is a classification algorithm from the Naive Bayes family, implemented
in the scikit-learn library. It assumes that features in your data follow a Gaussian
(normal) distribution within each class. Based on this assumption, it predicts the class
probability for a new data point using Bayes' theorem.
• Simplicity: Easy to implement and computationally efficient.
• Assumption of Independence: Assumes that the presence of a particular feature is
independent of the presence of any other feature.
• Scalability: Works well with large datasets and can handle a large number of features.
• Applicability: Naive Bayes is effective for disease prediction when the symptoms
(features) are conditionally independent given the disease. It is particularly useful when
rapid prediction is required with minimal computational resources.
27 | P a g e
Strengths:
• Fast training and prediction times.
• Effective for text classification and other problems with continuous features.
• Handles missing values well.
Weaknesses:
• The assumption of Gaussian distribution might not hold true for all datasets, potentially
affecting accuracy.
• May not perform as well as other algorithms for complex datasets.
Usage : from sklearn.naive_bayes import GaussianNB
Figure -7: Naïve Bayers Algorithm
Numpy: is a general-purpose array-processing package. It provides a high-performance

multidimensional array object, and tools for working with these arrays. It is the
fundamental package for scientific computing with Python. It is open-source software. It
contains various features including these important ones:
a. A powerful N-dimensional array object

b. Sophisticated (broadcasting) functions
c. Tools for integrating C/C++ and Fortran code
d. Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined using Numpy
which allows NumPy to seamlessly and speedily integrate with a wide variety of
databases.
Pandas : Pandas is a free, open-source Python library specifically designed for data
manipulation and analysis. It provides high-performance, easy-to-use data structures and
data analysis tools, making it an essential component of the Python data science ecosystem.
28 | P a g e
Key Features of Pandas:
• DataFrames: The core data structure in Pandas is the DataFrame, a two-dimensional,
size-mutable, tabular data structure with labeled axes (rows and columns). It's
analogous to a spreadsheet but offers powerful functionalities for data handling.
• Series: One-dimensional arrays (like lists) can be created using Series, which are also
labeled and capable of holding any data type.
• Data Cleaning and Manipulation: Pandas excels at cleaning messy data. It offers
features for handling missing values, filtering data, selecting specific columns or rows,
sorting, merging DataFrames, and more.
• Data Analysis: Powerful statistical functions are available for calculating descriptive
statistics (mean, median, standard deviation, etc.), groupby operations for analyzing
subsets of data, and time series analysis for data with a time component.
Usage of Pandas :
• Loading Data: Pandas can efficiently load data from diverse file formats into
DataFrames or Series, simplifying data ingestion.
• Data Cleaning: Missing values can be imputed or removed, outliers can be detected and
handled, and data can be formatted for consistency.
• Data Exploration: Descriptive statistics can be computed to gain insights into the data's
distribution and central tendency.
• Data Transformation: New columns can be created based on existing ones, data can be
grouped and aggregated for analysis, and data can be reshaped or pivoted for different
views.
• Data Visualization: Basic data visualizations can be created using Pandas' plotting
capabilities, and it can be used to prepare data for more advanced visualizations with
other libraries.
• Feature Engineering: Pandas plays a crucial role in preparing data for machine learning
models by creating new features, scaling features, and handling categorical data.
The OS module in Python provides functions for interacting with the operating system. OS
comes under Python’s standard utility modules. This module provides a portable way of
using operating system-dependent functionality. The os and os.path modules include
many functions to interact with the file system.
Matplotlib : Matplotlib is a fundamental and versatile library in Python for creating static,
animated, and interactive visualizations of data. It empowers you to transform numerical
information into clear and insightful graphical representations, making data exploration,
analysis, and communication more effective.
Features and Capabilities:
• Extensive Plot Types: Matplotlib offers a wide array of plot types to suit your data and
communication needs:
1. Line plots (ideal for trends and time series)
2. Scatter plots (effective for exploring relationships between variables)
3. Bar charts (useful for comparing categories)
29 | P a g e
4. Histograms (great for visualizing data distribution)
5. Pie charts (suitable for depicting parts of a whole)
6. Box plots (excellent for summarizing distributions with outliers)
7. Heatmaps (powerful for visualizing matrices)
8. 3D plots (allow for visualization in a three-dimensional space)
9. Many more!
• Customization: Matplotlib provides extensive control over every aspect of your plots.
You can customize:
1. Line styles, colors, and markers
2. Axis labels, titles, and legends
3. Tick marks, grid lines, and background colors
4. Text properties (fonts, sizes, colors)
5. Layouts and subplots
• Integration with NumPy and Pandas: Matplotlib seamlessly integrates with NumPy
arrays and Pandas DataFrames, the workhorses of scientific computing and data
analysis in Python. This smooth integration allows you to leverage these data structures
directly for plotting without the need for conversion.
• Interactive Visualization with Libraries: While Matplotlib itself primarily creates

static plots, you can use libraries like matplotlib.pyplot and IPython.display for
interactive visualization within environments like Jupyter Notebook. This enables you
to explore your data dynamically.
• Animation Capabilities: Matplotlib offers tools for creating animations, allowing you
to visualize how data changes over time or across parameters.
Tkinter Library: Tkinter is a Python library that allows you to create graphical user
interfaces (GUIs) for your applications. Here's a breakdown of what it is and how you can use
it:
Tkinter is the standard Python interface to the Tk (or Tcl/Tk) GUI toolkit. Tk is a cross-platform
library that's available on most operating systems, making your Tkinter applications usable on
Windows, macOS, and Linux.
It comes built-in with Python, so you don't need to install any additional libraries to use it.
Usage :
Tkinter provides various widgets, which are the building blocks of your GUI. These widgets
include buttons, labels, text boxes, menus, and more.
You can arrange these widgets within a window and define their behavior to create interactive
applications. For instance, you can create a button that triggers a specific function when
clicked.
Code : import tkinter as tk
30 | P a g e
CHAPTER 7 ResULTs & DIsCUssION
7.1 Installation process:

Installing Python
1. Go to the python website and download the latest version.

https://www.python.org/downloads/
2. Run the Installer
Figure-8: Python Installer Window
3. Check python installed using command prompt.
Figure-9: Python version check
31 | P a g e
Installing pip
https://pip.pypa.io/en/stable/installation/
1. Download the script from the following link:

https://bootstrap.pypa.io/get-pip.py
2. Run the downloaded script from the download location
Figure -10: Installing pip

Installing python libraries and modules
1. Installing sklearn library

Command: > pip install scikit-learn
Figure -11: Installing sklearn library
2. Installing numpy library

Command: > pip install numpy
Figure -12: Installation of Numpy Library
32 | P a g e
3. Installing pandas library
Command: > pip install pandas
Figure -13: Installation od Pandas Library
4. Installing tkinter library

Command: > pip install tk
Figure -14: Installation of Tkinter Library
Run the project
Command: > python -u “root.mainloop()”
7.2 PROJECT SCREENSHOTS
7.2.1. Graphical User Interface
Figure-15: User Interface of Project
33 | P a g e
Figure-16: User Interface of Project Prediction using Decision Tree
Figure-17: User Interface of Project Prediction using Decision Tree & Random
Forest
34 | P a g e
Figure-18: User Interface of Project Prediction using Decision Tree, Random Forest,
Naïve Bayes & K-Nearest Neighbour
7.2.2 Testing Dataset
Figure-19: Testing Dataset
35 | P a g e
7.2.3 Training Dataset
Figure-20: Training Dataset
7.2.4. Scatter Plot for disease Listeriosis
Figure-21: Scatter Plot for listeroisis
36 | P a g e
7.2.5. Scatter Plot for disease Coccidiosis
Figure-22: Scatter Plot for Coccidiosis
7.2.6. Confusion Matrix for Decision Tree
Figure-23: Confusion Matrix
37 | P a g e
7.2.7. Disease for which Model is trained
Figure-24: List of Diseases
7.2.8. Symptoms of Cattle
Figure-25: List of Symptoms
7.2.9. Backend DataBase storing information about Cattles
Figure-26: Backend DataBase
38 | P a g e
7.3 OPPORTUNITIES
• Early Detection: Machine learning models can analyze data from various sources, like
temperature sensors, activity trackers, and visual analysis of lesions, to identify subtle
changes that might indicate an early stage of disease. This allows for faster intervention
and treatment, improving animal welfare and reducing potential losses.
• Improved Accuracy: Machine learning algorithms can be trained on vast datasets of
cattle health information and disease symptoms. This allows them to identify patterns
and relationships between symptoms and specific diseases with high accuracy,
potentially exceeding traditional methods relying solely on visual inspection.
• Reduced Costs: Early disease detection and treatment can minimize the need for
expensive medication and veterinary care in later stages. Additionally, the ability to
isolate infected animals quickly can prevent the spread of disease within the herd,
reducing overall economic losses.
• Scalability and Automation: Machine learning models can be deployed on farms of
all sizes. Data collection and analysis can be automated, reducing the workload on
farmers and veterinarians. This allows for more frequent monitoring of cattle health and
faster response times.
• Data-Driven Insights: Machine learning can analyze large datasets to identify trends
and patterns in disease outbreaks. This information can be invaluable for researchers
and veterinarians, helping them understand disease transmission patterns, develop
preventative measures, and improve overall herd health management strategies.
7.4 CHALLENGES
Despite the potential benefits, there are also challenges to consider:

• Data Quality and Availability: Training effective machine learning models requires
large amounts of high-quality data, including accurate symptom annotations and
disease diagnoses. Collecting and labeling such data can be time-consuming and
expensive, especially for less common diseases.
• Model Generalizability: Models trained on data from a specific breed or location
might not generalize well to other populations. Factors like breed variations,
environmental conditions, and local disease strains can affect symptom presentation.
• Veterinarian Expertise Remains Crucial: Machine learning models are a valuable
tool, but they should not replace the expertise of veterinarians. Vets play a vital role in
confirming diagnoses, interpreting test results, and determining the most appropriate
treatment course.
• Algorithmic Bias: If training data is biased, the model can inherit that bias and
potentially misclassify certain cases. Careful data selection and model evaluation are
crucial to mitigate bias.
• Explainability and Transparency: Machine learning models can be complex, making
it challenging to understand how they arrive at their predictions. This lack of
transparency can be a hurdle for gaining trust from farmers and veterinarians who need
to understand the reasoning behind the model's recommendations.
39 | P a g e
CHAPTER 8 sUmmaRy
In our project, we explored the intersection of data science and veterinary medicine to
predict diseases in cattle. By analyzing symptom patterns, we aimed to provide early
detection and personalized interventions for livestock health.
Methodology and Data Collection: We assembled a diverse dataset comprising cattle health
records, symptoms, and corresponding disease labels. Feature engineering involved selecting
relevant symptoms and encoding them effectively.
Model Selection and Training: We experimented with various machine learning algorithms,
including decision trees, random forests, Naive Bayes and K nearest Neighbor. The dataset
was split into training and testing sets. Our best-performing model achieved an accuracy
of 97% on the data.
The final step involves deployment of the best-performing model into a user-friendly web-
based or mobile application. This application is designed for ease of symptom input and result
interpretation, enabling real-time predictions and immediate diagnostic assistance.
Certain symptoms consistently emerged as strong predictors for specific diseases. We

encountered challenges related to class imbalance and robustness across different cattle
populations.
Implications and Future Work:

• Early Detection: Our ML models can identify diseases at an early stage, enabling
timely treatment and preventing further spread.
• Precision Agriculture: Farmers can make informed decisions regarding herd
management, reducing economic losses.
• Ethical Considerations: Ensuring fairness and avoiding bias in predictions remains
critical.
• The Cattle Disease Prediction Project demonstrates the potential of machine learning
in veterinary medicine. By harnessing data-driven insights, we contribute to
sustainable agriculture, improve livelihoods, and enhance cattle health.
40 | P a g e
CHAPTER 9 LITeRaTURe CITeD
• https://www.kaggle.com/datasets/ashtired11/cattle-diseases
• https://www.irjmets.com/uploadedfiles/paper//issue_12_december_2023/47014/final/f
in_irjmets1702109350.pdf
• https://www.python.org/downloads/
• https://www.researchgate.net/publication/343387698_Application_of_Machine_Learn
ing_in_Animal_Disease_Analysis_and_Prediction
• https://www.geeksforgeeks.org/machine-learning-algorithms/
41 | P a g e
VITAE
The author, DURGESH KUMAR was born on 29-JULY-2003. He passed his

High School from Shri Dev Inter College, Pathri, Haridwar (Uttarakahand) and
Intermediate from Shri Dev Inter College, Pathri, Haridwar (Uttarakahand).He
joined as an undergraduate student in College of Technology, Govind Ballabh
Pant University of Agriculture & Technology, Pantnagar for programme
Bachelor of Technology (Computer Engineering).
Address for Communication:
Durgesh Kumar
S/o Naresh Kumar
62, Jhabri, Pathri Forest Range,
Haridwar, Uttarakhand
Pin Code – 249404
Phone No- 7351181385
Email – dk.durgesh97kumar@gmail.com
42 | P a g e
VITAE
The author, PRASHANT RANA was born on 29-SEP-2001. He passed his High
School from Shri Guru Nanak Academy School, Nanakmatta, Udham Singh
Nagar (Uttarakhand) and Intermediate from G.S. Convent School , Sitarganj,
Udham Singh Nagar (Uttarakhand).He joined as an undergraduate student in
College of Technology, Govind Ballabh Pant University of Agriculture &
Technology, Pantnagar for programme Bachelor of Technology (Computer
Engineering).

Prashant Rana
S/o Malkeet Singh
Aamkhera (Balkhera) P.O- Nanakmatta,
Udham Singh Nagar, Uttarakhand
Pin Code – 262311
Phone No- 9411344273
Email - prashantrana944@gmail.com
43 | P a g e
VITAE
The author, DIVYANSHI NEOLIA was born on 6-Jan-2002. She passed her
High School from St. Theresa Sr. Sec. School, Kathgodam Haldwani Uttarakhand
and Intermediate from St. Theresa Sr.Sec. School Kathgodam Haldwani
Uttarakhand. She joined as an undergraduate student in College of Technology,
Govind Ballabh Pant University of Agriculture & Technology, Pantnagar for
programme Bachelor of Technology (Computer Engineering).

Divyanshi Neolia
D/o Jagdish Chandra Neolia
Awas Vikas Colony
Haldwani, Nainital
Pin Code – 263139
Phone Number – 6398090858
Email – divyanshineolia@gmail.com
44 | P a g e
VITAE
The author, PRANSHU PANDE was born on 20-AUG-2002. He passed his High
School from St. Josephs’ College, Nainital (Uttarakhand) and Intermediate from
St. Josephs’ College, Nainital (Uttarakhand). He joined as an undergraduate
student in College of Technology, Govind Ballabh Pant University of
Agriculture & Technology, Pantnagar for programme Bachelor of Technology
(Computer Engineering).
Pranshu Pande
S/O Paryank Pande,
Hydel Colony, Mayvilla Compound,
Tallital, Nainital, Uttarakhand
Pin Code – 263001
Phone No- 7060016543
Email – pranshu2008pande@gmail.com
45 | P a g e
APPENDIX A
#Importing libraries and packages
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from tkinter import *
import numpy as np
import pandas as pd
import os
#List of the all symptoms is listed here in list l1.

l1=['anorexia','abdominal_pain','anaemia','abortions','acetone','aggression','arthrogyposis',
'ankylosis','anxiety','bellowing','blood_loss','blood_poisoning','blisters','colic','Condemnation_
of_livers','coughing','depression','discomfort','dyspnea','dysentery','diarrhoea','dehydration','dr
ooling','dull','decreased_fertility','diffculty_breath','emaciation','encephalitis','fever','facial_par
alysis','frothing_of_mouth','frothing','gaseous_stomach','highly_diarrhoea','high_pulse_rate','h
igh_temp','high_proportion','hyperaemia','hydrocephalus','isolation_from_herd','infertility','int
ermittent_fever','jaundice','ketosis','loss_of_appetite','lameness','lack_of_coordination','letharg
y','lacrimation','milk_flakes',
'milk_watery','milk_clots','mild_diarrhoea','moaning','mucosal_lesions','milk_fever','nausea','n
asel_discharges','oedema','pain','painful_tongue','pneumonia','photo_sensitization','quivering_
lips','reduction_milk_vieds','rapid_breathing','rumenstasis','reduced_rumination','reduced_fert
ility','reduced_fat','reduces_feed_intake','raised_breathing','stomach_pain','salivation','stillbirt
hs','shallow_breathing','swollen_pharyngeal','swelling','saliva','swollen_tongue','tachycardia','t
orticollis','udder_swelling','udder_heat','udder_hardeness','udder_redness','udder_pain','unwill
ingness_to_move','ulcers','vomiting','weight_loss','weakness']
#List of Diseases (26 Cattle Diseases Mention) is listed in list disease.

disease=['mastitis','blackleg','bloat','coccidiosis','cryptosporidiosis','displaced_abomasum',
'gut_worms','listeriosis','liver_fluke','necrotic_enteritis','peri_weaning_diarrhoea','
rift_valley_fever','rumen_acidosis','traumatic_reticulitis','calf_diphtheria','foot_rot','foot_and_
mouth','ragwort_poisoning','wooden_tongue','infectious_bovine_rhinotracheitis',
'acetonaemia','fatty_liver_syndrome','calf_pneumonia','schmallen_berg_virus',
'trypanosomosis','fog_fever']
46 | P a g e
#Reading the Cattle training Dataset .csv file
df=pd.read_csv("training.csv")
DF= pd.read_csv('training.csv', index_col='prognosis')
#indexing of dataframe
df.replace({'prognosis':{'mastitis':0,'blackleg':1,'bloat':2,'coccidiosis':3,'cryptosporidiosi
s':4,
'displaced_abomasum':5,'gut_worms':6,'listeriosis':7,'liver_fluke':8,'necrotic_enteritis':9
,'peri_weaning_diarrhoea':10,
'rift_valley_fever':11,'rumen_acidosis':12,
'traumatic_reticulitis':13,'calf_diphtheria':14,'foot_rot':15,'foot_and_mouth':16,'ragwort
_poisoning':17,'wooden_tongue':18,'infectious_bovine_rhinotracheitis':19,
'acetonaemia':20,'fatty_liver_syndrome':21,'calf_pneumonia':22,'schmallen_berg_virus'
:23,'trypanosomosis':24,'fog_fever':25}},inplace=True)
DF.head()
df.head()
# Scatter and density plot

def plotScatterMatrix(df1, plotSize, textSize):
df1 = df1.select_dtypes(include =[np.number]) # keep only numerical columns
# Remove rows and columns that would lead to df being singular
df1 = df1.dropna('columns')
df1 = df1[[col for col in df if df[col].nunique() > 1]] # keep columns where there are
more than 1 unique values
columnNames = list(df)
if len(columnNames) > 10: # reduce the number of columns for matrix inversion of
kernel density plots
columnNames = columnNames[:10]
df1 = df1[columnNames]
ax = pd.plotting.scatter_matrix(df1, alpha=0.75, figsize=[plotSize, plotSize],
diagonal='kde')
corrs = df1.corr().values
for i, j in zip(*plt.np.triu_indices_from(ax, k = 1)):
ax[i, j].annotate('Corr. coef = %.3f' % corrs[i, j], (0.8, 0.2), xycoords='axes fraction',
ha='center', va='center', size=textSize)
plt.suptitle('Scatter and Density Plot')
plt.show()
#plotPerColumnDistribution(df, 10, 5)
#plotScatterMatrix(df, 20, 10)
X= df[l1]
47 | P a g e
y = df[["prognosis"]]
print(y)
np.ravel(y)
print(X)
print(y)
#Reading the Cattle testing Dataset .csv file

tr=pd.read_csv("testing.csv")
#Using inbuilt function replace in pandas for replacing the values

tr.replace({'prognosis':{'mastitis':0,'blackleg':1,'bloat':2,'coccidiosis':3,'cryptosporidiosis':4,
'displaced_abomasum':5,'gut_worms':6,'listeriosis':7,'liver_fluke':8,'necrotic_enteritis':9,'per
i_weaning_diarrhoea':10,
'rift_valley_fever':11,'rumen_acidosis':12,
'traumatic_reticulitis':13,'calf_diphtheria':14,'foot_rot':15,'foot_and_mouth':16,'ragwort_poi
soning':17,'wooden_tongue':18,'infectious_bovine_rhinotracheitis':19,
'acetonaemia':20,'fatty_liver_syndrome':21,'calf_pneumonia':22,'schmallen_berg_virus':23,'
trypanosomosis':24,'fog_fever':25}},inplace=True)
tr.head()
X_test= tr[l1]
y_test = tr[["prognosis"]]
np.ravel(y_test)
print(X_test)
print(y_test)
#list1 = DF['prognosis'].unique()
def scatterplt(disea):
x = ((DF.loc[disea]).sum())#total sum of symptom reported for given disease
x.drop(x[x==0].index,inplace=True)#droping symptoms with values 0
print(x.values)
y = x.keys()#storing nameof symptoms in y
print(len(x))
print(len(y))
plt.title(disea)
plt.scatter(y,x.values)
plt.show()
def scatterinp(sym1,sym2,sym3,sym4,sym5):
x = [sym1,sym2,sym3,sym4,sym5]#storing input symptoms in y
y = [0,0,0,0,0]#creating and giving values to the input symptoms
if(sym1!='Select Here'):
48 | P a g e
y[0]=1
y[1]=1
y[2]=1
y[3]=1
y[4]=1
print(x)
print(y)
plt.scatter(x,y)
plt.show()
DECISION TREE ALGORITHM

root = Tk()
pred1=StringVar()
def DecisionTree():
if len(NameEn.get()) == 0:
pred1.set(" ")
comp=messagebox.askokcancel("System","Kindly Fill the Name")
if comp:
root.mainloop()
elif((Symptom1.get()=="Select Here") or (Symptom2.get()=="Select Here")):
pred1.set(" ")
sym=messagebox.askokcancel("System","Kindly Fill atleast first two Symptoms")
if sym:
root.mainloop()
else:
from sklearn import tree
clf3 = tree.DecisionTreeClassifier()
clf3 = clf3.fit(X,y)
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

y_pred=clf3.predict(X_test)
print("Decision Tree")
print("Accuracy")
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
print("Confusion matrix")
conf_matrix=confusion_matrix(y_test,y_pred)
print(conf_matrix)
49 | P a g e
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf3.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred1.set(" ")
pred1.set(disease[a])
else:
pred1.set(" ")
pred1.set("Not Found")
import sqlite3
conn = sqlite3.connect('database.db')
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS DecisionTree(Name
StringVar,Symtom1 StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4
TEXT,Symtom5 TEXT,Disease StringVar)")
c.execute("INSERT INTO
DecisionTree(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
VALUES(?,?,?,?,?,?,?)",(NameEn.get(),Symptom1.get(),Symptom2.get(),Symptom3.get(),Sy
mptom4.get(),Symptom5.get(),pred1.get()))
conn.commit()
c.close()
conn.close()
scatterinp(Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get(
))
scatterplt(pred1.get())
RANDOM FOREST ALGORITHM

pred2=StringVar()
def randomforest():
pred1.set(" ")
50 | P a g e
if comp:
root.mainloop()
pred1.set(" ")
if sym:
root.mainloop()
else:
from sklearn.ensemble import RandomForestClassifier
clf4 = RandomForestClassifier(n_estimators=100)
clf4 = clf4.fit(X,np.ravel(y)
y_pred=clf4.predict(X_test)
print("Random Forest")
print("Accuracy")
print(conf_matrix)
psymptoms =
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf4.predict(inputtest)
h='no'
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred2.set(" ")
else:
pred2.set(" ")
import sqlite3
51 | P a g e
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS RandomForest(Name
RandomForest(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
conn.commit()
c.close()
conn.close()
#printing scatter plot of disease predicted vs its symptoms
K NEAREST NEIGHBOUR ALGORITM

pred4=StringVar()
def KNN():
pred1.set(" ")
if comp:
root.mainloop()
pred1.set(" ")
if sym:
root.mainloop()
else:
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
knn=knn.fit(X,np.ravel(y))

y_pred=knn.predict(X_test)
print("kNearest Neighbour")
print("Accuracy")
print(conf_matrix)
52 | P a g e
psymptoms =
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = knn.predict(inputtest)
h='no'
if(predicted == a):
h='yes'
break
if (h=='yes'):
pred4.set(" ")
else:
pred4.set(" ")
import sqlite3
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS KNearestNeighbour(Name
KNearestNeighbour(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
conn.commit()
c.close()
conn.close()
53 | P a g e
NAIVE BAYES ALGORITHM
pred3=StringVar()
def NaiveBayes():
pred1.set(" ")
if comp:
root.mainloop()
pred1.set(" ")
if sym:
root.mainloop()
else:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb=gnb.fit(X,np.ravel(y))

y_pred=gnb.predict(X_test)
print("Naive Bayes")
print("Accuracy")
print(conf_matrix)
psymptoms =
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = gnb.predict(inputtest)
h='no'
if(predicted == a):
h='yes'
break
if (h=='yes'):
54 | P a g e
pred3.set(" ")
else:
pred3.set(" ")
import sqlite3
c = conn.cursor()
c.execute("CREATE TABLE IF NOT EXISTS NaiveBayes(Name StringVar,Symtom1
StringVar,Symtom2 StringVar,Symtom3 StringVar,Symtom4 TEXT,Symtom5 TEXT,Disease
StringVar)")
NaiveBayes(Name,Symtom1,Symtom2,Symtom3,Symtom4,Symtom5,Disease)
conn.commit()
c.close()
conn.close()
#printing scatter plot of disease predicted vs its symptoms
GRAPHICAL USER INTERFACE

#Create a root window
root.configure(background='Whitesmoke')
root.title('Cattle Disease Predictor')
root.resizable(0,0)
Symptom1 = StringVar()
Symptom1.set("Select Here")
Name = StringVar()
prev_win=None
def Reset():
global prev_win
55 | P a g e
NameEn.delete(first=0,last=100)
pred1.set(" ")
pred2.set(" ")
pred3.set(" ")
pred4.set(" ")
try:
prev_win.destroy()
prev_win=None
except AttributeError:
Pass
from tkinter import messagebox
def Exit():
qExit=messagebox.askyesno("System","Do you want to exit the system")
if qExit:
root.destroy()
exit()
#Headings for the GUI written at the top of GUI
w2 = Label(root, justify=CENTER, text=" Cattle Disease Predictor ", fg="black",
bg="Whitesmoke")
w2.config(font=("Times",30,"bold italic"))
w2.grid(row=1, column=0, columnspan=2, padx=100)
#w2 = Label(root, justify=LEFT, text="PROJECT GROUP 5 ", fg="Pink", bg="Ivory")
w2.config(font=("Times",30,"bold italic"))
w2.grid(row=2, column=0, columnspan=2, padx=100)
#Label for the name
NameLb = Label(root, text="Cattle ID/Name ", fg="Black", bg="Whitesmoke")
NameLb.config(font=("Times",15,"bold italic"))
NameLb.grid(row=6, column=0, pady=15, sticky=W)
#Labels for the different algorithms

lrLb = Label(root, text="DecisionTree", fg="Black", bg="lightgrey", width = 20)
lrLb.config(font=("Times",15,"bold italic"))
lrLb.grid(row=15, column=0, pady=10,sticky=W)
destreeLb = Label(root, text="RandomForest", fg="Black", bg="lightgrey", width = 20)

destreeLb.config(font=("Times",15,"bold italic"))
destreeLb.grid(row=17, column=0, pady=10, sticky=W)
56 | P a g e
knnLb = Label(root, text="Naive Bayes", fg="Black", bg="lightgrey", width = 20)
knnLb.config(font=("Times",15,"bold italic"))
knnLb.grid(row=19, column=0, pady=10, sticky=W)
OPTIONS = sorted(l1)
ranfLb = Label(root, text="K-Nearest Neighbour", fg="Black", bg="lightgrey", width = 20)

ranfLb.config(font=("Times",15,"bold italic"))
ranfLb.grid(row=21, column=0, pady=10, sticky=W)
#Taking name as input from user
NameEn = Entry(root, textvariable=Name)
NameEn.grid(row=6, column=1)
#Taking Symptoms as input from the dropdown from the user

S1 = OptionMenu(root, Symptom1,*OPTIONS)
S1.grid(row=7, column=1)
#Buttons for predicting the disease using different algorithms
dst = Button(root, text="Prediction 1",
command=DecisionTree,bg="lightyellow",fg="Black")
dst.config(font=("Times",15,"bold italic"))
dst.grid(row=6, column=3,padx=10)
rnf = Button(root, text="Prediction 2", command=randomforest,bg="pink",fg="Black")

rnf.config(font=("Times",15,"bold italic"))
rnf.grid(row=7, column=3,padx=10)
lr = Button(root, text="Prediction 4", command=NaiveBayes,bg="lightblue",fg="Black")

lr.config(font=("Times",15,"bold italic"))
lr.grid(row=9, column=3,padx=10)
kn = Button(root, text="Prediction 3", command=KNN,bg="lightseagreen",fg="Black")

kn.config(font=("Times",15,"bold italic"))
kn.grid(row=8, column=3,padx=10)
57 | P a g e
rs = Button(root,text="Reset Inputs",
command=Reset,bg="crimson",fg="White",width=15)
rs.config(font=("Times",15,"bold italic"))
rs.grid(row=10,column=3,padx=10)
ex = Button(root,text="Exit System", command=Exit,bg="crimson",fg="White",width=15)

ex.config(font=("Times",15,"bold italic"))
ex.grid(row=11,column=3,padx=10)
#Showing the output of different algorithms

t1=Label(root,font=("Times",15,"bold italic"),text="Decision
Tree",height=1,bg="lightyellow"
,width=40,fg="Black",textvariable=pred1,relief="sunken").grid(row=15, column=1,
padx=10)
t2=Label(root,font=("Times",15,"bold italic"),text="Random Forest",height=1,bg="pink"

padx=10)
t3=Label(root,font=("Times",15,"bold italic"),text="Naive Bayes",height=1,bg="lightblue"

padx=10)
t4=Label(root,font=("Times",15,"bold italic"),text="K-Nearest
Neighbour",height=1,bg="lightseagreen"
padx=10)
#Mainloop the application is ready to run

root.mainloop()
58 | P a g e

Final Project Report

Uploaded by

Copyright:

Available Formats

Final Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project Report

Uploaded by

Copyright:

Available Formats

A

CATTLE DISEASE PREDICTION BY USING MACHINE LEARNING

SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS OF THE DEGREE

Prof. JALAJ SHARMA

Durgesh Kumar I.D. No. (56126)

Department of Computer Engineering, College of Technology,

Group Members I.D. Signature

Prashant Rana I.D. No. (56127) ………………….

Divyanshi Neolia I.D. No. (56280) ………………….

Pranshu Pande I.D. No. (56306) ………………….

NAME ID. No.

Durgesh Kumar 56126

Prashant Rana 56127

Divyanshi Neolia 56280

Pranshu Pande 56306

DATE: (Prof. JALAJ SHARMA)

Department of Computer Engineering, College of Technology,

Signature of Head of Department

Department of Computer Engineering, College of Technology,

7. Results and Discussion...................................................................................... 31

10. Bio-Data of Students....................................................................................... 42

Cattle disease prediction is a critical aspect of modern livestock management,

1.1. CATTLE DISEASE PREDICTION SYSTEM

1.3. PROJECT OBJECTIVES AND SIGNIFICANCE

1.4. ADVANTAGES OF MACHINE LEARNING BASED CATTLE

Figure-1: Flow Diagram of the Project

2.1 Key Studies and Methodologies

8. Predicting Lameness in Cattle Using Naive Bayes

2.2 Findings and Discussions

• To collect and preprocess a comprehensive dataset of cattle disease symptoms and

• Data collection from veterinary records and agricultural databases.

Figure-2 : Symptoms Observations in Cattle

4.2 Software Configuration

● Frameworks and Libraries

4.3 Functional Requirements

Additionally, we designed a user-friendly interface, enabling veterinarians and farmers to

Finally, we formulated a scalability and deployment plan, leveraging cloud computing

4.4 Non-Functional Requirements

Figure-3 : Working Steps

5.1 ACQUIRING DATASET

5.3 TRAINING THE MACHINE LEARNING MODEL

5.4 MODEL DEVELOPMENT

Then the model is deployed for processing using various steps:

• User Acceptance Testing

• Security and Maintenance

DecisionTreeClassifier: In Python's machine learning realm, DecisionTreeClassifier from

Usage: from sklearn.ensemble import RandomForestClassifier

Figure-5 : Random Forest Algorithm

KNeighborsClassifier: In Python, KNeighborsClassifier is a class from the scikit-learn

Figure-6: KNN Classification Algorithm

Figure -7: Naïve Bayers Algorithm

Numpy: is a general-purpose array-processing package. It provides a high-performance

a. A powerful N-dimensional array object

• Interactive Visualization with Libraries: While Matplotlib itself primarily creates

7.1 Installation process:

1. Go to the python website and download the latest version.

Figure-8: Python Installer Window

3. Check python installed using command prompt.

Figure-9: Python version check

1. Download the script from the following link: