Prediction of Poultry Yield Using Data Mining Techniques
Prediction of Poultry Yield Using Data Mining Techniques
Prediction of Poultry Yield Using Data Mining Techniques
Open Access
2
Dr. N.D. Oye,
Department of Computer Science, ModibboAdama University, Yola, Adamawa state, Nigeria.
3
Celestine,H.R,
Department of Computer Science, ModibboAdama University, Yola, Adamawa state, Nigeria.
ABSTRACT
A poultry yield prediction model have then designed using a data mining and machine learning
technique called Classification and Regression Tree (CART) algorithm. The developed model has
been optimized and pruned using the Reduced Error Pruning (REP) algorithm to improve prediction
accuracy. An algorithm to make the prediction model flexible and capable of making predictions
irrespective of poultry size or population has been proposed. The model can be used by poultry
farmers to predict yield even before a breeding season. The model can also be used to help farmers
take decisions to ensure desirable yield at the end of the breeding season.
I. INTRODUCTION
Over the years, pattern extraction from data has evolved from manual to automated processing. Early
pattern extraction methods includes Bayes‘ theorem from the 1700s to regression analysis in the
1800s. The revolution of technology especially computer technology has brought about increase in
large data storage, collection and manipulation hence the need for methods and techniques to
efficiently discover patterns in these large data (Mucherinoet al., 2009). The need for data exploration
and extraction later brought about discoveries in Computer Science such as cluster analysis, neural
networks, genetic algorithms, decision rules, decision trees and support vector machines; all of which
constitute methods of data mining (Han et al., 2011). Data mining is therefore the process of exploring
large data sets so as to find purposeful patterns, relationships, correlations or associations within the
data sets (Klosgen and Zytkow 2002). It forms the intersection linking various disciplines such as
computer science, statistics, machine learning and database systems (Bozdogan, 2003). The main
objective of data mining is to convert meaningless data to meaning information which results to
knowledge discovery (Sumathi and Sivanandam, 2006). Data mining goes beyond just analyzing raw
data. It involves establishment of practices and policies that manage full data life cycle of an
organization or enterprise. Data mining also involves building of models and deduction of inference
(Han et al., 2011). This means that data mining goes beyond the mere extraction (mining) of data but
the extraction of patterns from data to produce knowledge. One attribute data mining and database
share is the storing, manipulation and extraction of data.
Data is collected and stored (database). The data is then worked upon (data mining) which results in
knowledge discovery. The discovered knowledge can then be stored for further use (database).
Different terms have been used to reference data mining. Terms such as: data archaeology,
information harvesting, information discovery, knowledge extraction and so on. Gregory Piatetsky-
Shapiro invented the term "knowledge discovery in databases" (KDD) in 1989. However, because of
the popularity of the term ―data mining‖ in machine learning and artificial intelligence (AI) community,
the terms KDD and data mining have been used interchangeably (Piatetsky-Shapiro et al., 2011). In
general, data mining encompasses six common types of tasks. They are anomaly detection,
association rule learning, clustering, classification, regression and summarization (Thuraisingham,
1998). Data are basically mined to achieve one or more of these tasks. Data mining in agriculture is a
recent research field (Ramesh and Vardhan, 2013). It is also considered as the future of agriculture
(ElFangary, 2009). This forms the basic motivation behind this research. Thus far, some data mining
applications in agriculture include: detection of diseases from animal sounds, predicting crop yield,
weather and soil types forecasting, etc.
Justification of Study
Poultry farming in Nigeria has been on a tremendous rise over the past decades. This may be
attributed to the high rate of unemployment in the country. For some individuals and states in Nigeria,
poultry farming has become a means of revenue generation. Nigerian as a sub-Sahara African
country rely on agricultural activities including poultry farming to create self-employment in a bid to
reduce poverty (Larsen et al., 2009; Heise et al., 2015). It is therefore important to introduce ideas that
will improve poultry farming in Nigeria. The researchers intend to improve poultry farming by
developing a model that poultry farmers can use to forecast or predict yield using data mining
techniques. Poultry farmers like every other business man (or woman), juggle between opportunity
costs, foregoing some needs in favour of others and at the same time, targeting maximum yield as
possible. This study is particularly useful as it can help poultry farmers through a number of
permutations of certain factors that affect poultry production and the possible yields that can result
from such permutations.
The K-NN classifier was prescribed as an efficient method for estimating soil water parameter
(Mucherinoet al., 2009) using crop simulation systems such as CROPSYST (Stockleet al., 1994),
DSSAT (Jones et al., 1998) or any crop simulation system. Soil parameters such as the lower limit of
plant water availability (LL), the drained upper limit and plant extractable soil water (PESW) are most
likely to be unavailable. K-NN algorithm can be used on available information such as soil texture and
organic carbon to obtain the unavailable parameters (Mucherinoet al., 2009). This shows that K-NN
classifier can be used to predict unknown variables from known ones. ElFangary (2009) developed a
model for improving cow and buffalo production in Egypt. The research used Pearson‘s Coefficient to
analyse and find correlations between variables such as pregnancy, death, diseases, vaccines and
the various interval of the animals‘ production to develop the model. The Artificial Neural Network
(ANN) algorithm is another powerful classifier used for prediction. A typical example of its application
was demonstrated by (Kondo et al., 2000) to predict that certain categories of oranges are relatively
sweeter by measuring the sugar and acid content of oranges. A three-layer artificial neural network
was used to predict that oranges with attributes: reddish color, medium size, low height and glossy
appearance are relatively sweeter. Another application of ANN in agriculture was conducted on pigs
to detect the presence of diseases via their sounds (Moshouet al., 2001). Initially sound samples of
354 sounds were trained. The sounds consist of coughs from different pigs, metal clanging, grunts,
and background noise. Sounds such as cough and metal clanging were difficult to distinguish
because they have similar frequency range (Mucherinoet al., 2009). The neural network was further
trained to distinguish the similar sound. Once that was done, result showed sound recognition
correctness greater than 90%.
Similarly, ANN was used to detect watercore in apples (Shahin et al., 2001). Watercore is an interior
apple disorder (Mucherinoet al., 2009; Herremans, 2014). An ANN was able to identify good apples
from bad ones based on their watercore severity. This study was necessary because watercore is an
internal disorder and consumers could only discover it after purchase of the apple (Mucherinoet al.,
2009). The Support Vector Machine (SVM) technique is normally restricted to discriminate between
two classes (Mucherinoet al.,2009; Campilho and Kamel, 2014). Gill et al., (2006) used
meteorological and soil moisture to develop SVM predictions for four and seven days forecast of soil
moisture. Just like Moshouet al., (2001) research on pigs, Fagerlund (2007) used SVM to distinguish
and recognize different bird species based on birds‘ sounds. Bird sound data were used to train a
SVM classifier in conjunction with a binary decision tree. N-fold cross validation was then used to
obtain the optimal classifier model that identifies birds.
Crop Yield has been predicted using Multiple Linear Regression (MLR) and Density-Based Clustering
Data Mining technique (Ramesh and Vardhan, 2015). Rajeshwari and Arunesh (2016) used three
Classification techniques: Naïve Bayes, JRip and J48 (also called C4.5 algorithm) to analyse and
predict soil types: red and black. JRip and J48 algorithms are decision tree algorithm proposed by
William Cohen and Ross Quinlan respectively. This researcher shows that both decision tree
algorithms produced higher prediction accuracy rate compared to the Naïve Bayes technique. JRip
and J48 produced 98.18% and 97.27% prediction accuracy while Naïve Bayes technique produced
86.36% prediction accuracy. Chowdhury and Ojha (2017) performed disease diagnosis on
mushrooms using Naïve Bayes, Sequential Minimal Optimization (SMO) and Ripple-Down Rule
Learner (RIDOR) Classification techniques. They concluded that the Naïve Bayes technique provides
better results for mushroom disease diagnosis.
(de Albornoz and Terashima, 2005). The ANN classifier is a fast learning algorithm which can
automatically learn from training dataset. However, the algorithm is hard to interpret and apply to
solve real life problems (Braspenning and Thuijsman, 1995; Patan, 2008). We are compelled to feel
that this technique might be too complicated for an average farmer to understand and utilise.For SVM,
Abe (2005) suggested the following advantages and disadvantages of SVM. The advantages are:
strong generalization ability of the dataset provides global optimum solution and robust to outliers.
Disadvantages include restriction to two classes thereby making multi-classification problem difficult
and extended training time. Poultry yield is a continuous variable not a categorical variable. It
therefore doesn‘t make sense to apply the SVM since the research goal is not to classify yield into two
classes but to predict yield.
Decision tree is machine learning and data mining technique that produce models which are easy to
interpret and understand (Rokach and Maimon, 2014). This technique is also capable to model
variables that have a non-linear relationship with each other (Raut and Nichat, 2017). Decision trees
work well with all variable types irrespective of whether it is categorical or continuous or both (Siau,
2008). Decision trees make use of a greedy algorithm which makes it very sensitive to outliers in the
training set. In addition to this drawback, the greedy algorithm may result in error predictions at the
leaves if an error occurs at corresponding higher level nodes (Rokach and Maimon, 2008). However,
to handle the problem of error prediction, large amount of training data sets can be used to train the
model (Mitchell, 1977; Aggarwal, 2015). Multiple Linear Regression (MLR) technique is only suitable
when the dependent and independent variables share linear relationships (Wendler and Gröttrup,
2016). This implies that situations where no linear relationship exists between some or all of the
variables; linear regression techniques (SLR and MLR) are not suitable. The Fisher Discriminant
Analysis (FDA) is similar to MLR. It produces fast, direct and concise analytical model solutions which
can easily be programmed by IT personnel. It also requires few instances of a dataset to build
models. The FDA is however sensitive to outliers, can‘t handle discrete independent variables or
missing values as well as suitable only for linear phenomena (Tuffery, 2011).
After critically assessing these prediction data mining techniques that have been applied in
agricultural research, we discover that poultry data works well with decision tree algorithm. This is
because decision tree works well with all kinds of data (categorical and continuous data). Decision
tree models are also easy to understand and interpret (this is particularly necessary if the model is to
be used by local poultry farmers).Vale et al. (2008) has also used decision tree to predict broiler
mortality rate. This research was however restricted to the impact environmental attributes
(environmental temperature) have on broilers. This research did not use key attributes such as:
diseases, vaccination, feed type, etc. to predict overall poultry yield.Another similar research for
identifying poultry disease based on their sound has been done by Sadeghi et al. (2015). While this
research is useful for the early detection of diseases among the poultry birds, the research did not
provide procedures for predicting overall poultry yield.
III. Methodology
Research Framework
The first step of building any model is the collection of dataset. Most times, the data are inconsistent
and contain errors making the data unfit for implementing the model. To resolve this, the data mining
task of anomaly detection called data pre-processing is required (Tan, 2006). The data is then divided
into two sets: the training data set and the validation data set. The training data set is used to build the
model using the CART algorithm (regression tree) and the validation set is used to optimize the model
by pruning it. The post pruning technique known as Reduced Error Pruning (REP) will be applied on
the fully grown tree to reduce model overfitting and increase prediction accuracy (Mitchell, 1977). The
model is then tested with the validation data set, a process referred to as cross validation. REP and
cross validation form part of the pruning process. The pruned tree produces a smaller, précised
prediction tree model which we propose to be the poultry prediction model. These steps have been
illustrated diagrammatically in the Figure 1.
CART Algorithm
CART is an umbrella term popularized by Breiman et al., (1984) to describe the similar procedures of
both classification trees and regression trees as a decision tree algorithm (Brieman et al, 1984). The
CART algorithm follows a procedure called recursive partitioning algorithm that seeks to repeatedly
partition a large dataset space into smaller rectangles or subsets aiming to contain as pure as
possible, elements of the same class or category (Han et al., 2011; Niu, 2017).Though, classification
tree and regression tree algorithms share a common decision tree name known as CART, there is a
major difference between both (Aggarwal, 2015). The Classification tree is mainly used to classify
categorical attributes / variables while the regression tree on the other hand is used to classify and
predict continuous or numeric values (Champandard, 2003). A categorical variable can be viewed as
a label or quantity used to represent a class for example: colour (red, green, blue) or age group
(young, adult, elderly) and so on. Numeric/continuous variables on the other hand are numbers that
can take any value (Hoffmann, 2016). Yield, the target variable to be predicted is a continuous
variable. This is the reason why the regression tree algorithm of CART has been chosen to build the
prediction model.
Splitting Criteria
To be able to classify similar data at various points in a dataset space, a criterion to determine what
attributes of the data to split and the particular point at which the splitting should occur is necessary.
That is where the splitting criteria come to play. The splitting criteria of a CART is the measure use to
determine the best variable to split as well as the most appropriate points to split the variable so as to
achieve classification purity (Diday et al, 2013). Purity in this case is the measure of the homogeneity
of elements or attribute in a particular class (Witten and Eibe, 2005; Aggarwal, 2015). If a particular
class/node is said to be 100% pure, it means that the class/node consist of 100% similar elements
with no error or outlier (dissimilar element/attribute). To achieve pure classification splits, for
classification trees, some splitting criteria have been proposed such as Entropy, Gini index and
Twoing (Mitchell, 1997; Wu and Kumar, 2009; Issac and Israr, 2014). For a regression tree model
however, a splitting criteria that involves an error based measure or measure of variance is
considered more appropriate because of the continuous numerical implication of the attributes of the
target variable (Witten and Frank, 2005; Aggarwal, 2015). To classify variables or attributes with
respect to a target numeric continuous variable, a locally optimised linear model is obtained from each
hierarchical partitioning of the decision node at the leaves of the tree (Aggarwal, 2015). To obtain a
true representation of the value of every split, the average of all the values of the split is computed
and used (Witten and Frank, 2005; Moolayil, 2016). This is indicated at the leaves of the tree and
along decision paths along the tree. One common variant measurement splitting criteria for a
regression tree is the Standard Deviation Reduction (SDR) measure (Witten and Frank, 2005;
Moolayil, 2016).
Pruning Criteria
Pruning a tree requires cutting off branches from the tree so as to improve accuracy and reduce
overfitting (Mitchell, 1997; Witten and Eibe, 2005). Pruning is a way of making complex and large
trees simpler and precise. This is in accordance to Occam‘s razor theory which states that a simpler
and less complex a model is, the more accurate it is (Hall et al., 2011).Pruning techniques/criteria that
involves the use of a validation dataset are called post pruning techniques. Post pruning requires that
a tree model be fully grown from top to bottom and then pruned bottom to top (Aggarwal, 2015). This
pruning technique is quite different from the pre pruning technique which requires that the tree be
stopped early enough before it begins to over fit (Mitchell, 1997). The problem with pre pruning
however is that there is the uncertainty of the ‗early point‘ to stop the tree growth (Aggarwal, 2015).
Mitchell, (1997) also suggested that growing the tree fully is the most practical approach for tree
induction models. For this reason, we decided to use a post pruning technique. Some post-pruning
criteria include cost complexity pruning, reduced-error pruning and rule-based pruning (Mitchell,
1997).
(Mitchell, 1997). The replacement is done only if the resulting pruned tree supports the validation
dataset. This is because classifications irregularities that may occur with the training dataset are
unlikely to occur with the validation dataset (Mitchell, 1997). The reduced-error pruning technique is
been used in this research because of its simplicity and speed (Mitchell, 1997).
REP algorithm/steps are given as:
Step 1: Break full tree into sub trees
Step 2: Prune each sub tree by replacing the decision node with the most common decision node to
form a pruned tree
Step 3: Test pruned tree against validation dataset
Step 4: Select pruned sub tree with the least classification error.
Cross Validation
The variables of the validation data set have been rearranged in the same pattern as the pruned tree.
Misclassified classes of pruned trees A and B have been indicated with bold italics as shown in Table
2 and Table 3 respectively.
We propose that pruned tree B be our selected model for predicting poultry yield. Pruned tree B has
been selected because of it contains less classification errors (22%, indicated in bold italics)
compared to pruned tree A (33%, also indicated in bold italics).
Algorithm
Poultry percentile prediction model, regression tree, N
N = sample population
370 ∗100
[ ]
400
Vaccine not enough = *N
100
388 ∗100
[ ]
400
Vaccine enough = *N
100
Else if
Feed = Feed low fat then
374 ∗100
[ ]
400
Disease low = 100
*N
Else if
346 ∗100
[ ]
400
Season dry = *N
100
356 ∗100
[ ]
400
Season rainy = *N
100
End if
End.
IV. Conclusion
Data mining techniques generally unravel hidden patterns in data. Knowledge can be discovered from
this hidden pattern. Poultry data have been collected and mined in this research and patterns which
can result in yield prediction have been discovered using regression tree of the CART algorithm. To
achieve this, we employed the SDR technique to hierarchically split the data rather than other splitting
techniques like Gini index and entropy because of the numerical and continuous implication of the
target variable ‗Yield‘.To avoid model over fitting and improve accuracy of the model, a post pruning
technique called REP have been used. In line with post pruning techniques, a validation data set was
set aside to test the performance of two pruned model trees. The model tree that performed better
with the validation data set was chosen as our proposed prediction model.To make the proposed
model flexible, we presented another algorithm that converts predictions into percentiles based on the
predictions of the proposed model. This algorithm makes prediction for whatever poultry population by
multiplying the resulting predictions at the leaf nodes with the poultry population (N). CART algorithms
have been applied for prediction purposes with high prediction accuracy. This can largely be attributed
to the fact that CART is a machine learning algorithm that is well grounded in rigorous statistics and
probability theory (Wu and Kumar, 2009). A CART model for predicting poultry yield has been
developed in this study and it has been pruned to provide optimal results.
REFERENCES
[1.] Abe, S. (2005). Support vector machines for pattern classification (Vol. 53). London: Springer.
[2.] Aggarwal, C. C. (2015). Data mining: the textbook. Springer.
[3.] Aggarwal, C. C. (Ed.). (2014). Data classification: algorithms and applications. CRC Press.
[4.] Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering. Algorithms and Applications,
Chapman & Halls.
[5.] Azzalini, A., &Scarpa, B. (2012). Dataanalysis and data mining: An introduction. OUP USA.
[6.] Baker, R. S. J. D. (2010). Data mining for education. International encyclopedia of education,
7(3), 112-118.
[7.] Banks, D., House, L., McMorris, F. R., Arabie, P., & Gaul, W. A. (Eds.). (2011). Classification,
Clustering, and Data Mining Applications: Proceedings of the Meeting of the International
Federation of Classification Societies (IFCS), Illinois Institute of Technology, Chicago, 15–
18 July 2004. Springer Science & Business Media.
[8.] Berretti, S., Thampi, S. M., &Dasgupta, S. (Eds.). (2016). Intelligent systems technologies and
applications. Springer International Publishing.
[9.] Bhattacharyya, D. K., &Kalita, J. K. (2013). Network anomaly detection: A machine learning
perspective. CRC Press.
[10.] Bozdogan, H. (Ed.). (2003). Statistical data mining and knowledge discovery. CRC Press.
[11.] Braspenning, P. J., &Thuijsman, F. (1995). Artificial neural networks: an introduction to ANN
theory and practice (Vol. 931). Springer Science & Business Media.
[12.] Breiman, L., Friedman, J., Stone, C. J., &Olshen, R. A. (1984). Classification and regression
trees. CRC press.
[13.] Campilho, A., &Kamel, M. (Eds.). (2014). Image Analysis and Recognition: 11th International
Conference, ICIAR 2014, Vilamoura, Portugal, October 22-24, 2014, Proceedings (Vol.
8814). Springer.
[14.] Champandard, A. J. (2003). AI game development: Synthetic creatures with learning and
reactive behaviors. New Riders.
[15.] Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing
surveys (CSUR), 41(3), 15.
[16.] Cherkassky, V., &Mulier, F. M. (2007). Learning from data: concepts, theory, and methods.
John Wiley & Sons.
[17.] Chowdhury, D. R., &Ojha, S. (2017). ―An Empirical Study on Mushroom Disease Diagnosis:
A Data Mining Approach‖, International Research Journal of Engineering and Technology,
4(1), 529-534.
[18.] deAlbornoz, A. G. Á., &Terashima-Marín, H. MICAI 2005: Advances in Artificial Intelligence.
[19.] deSá, J. P. M., Silva, L. M., Santos, J. M., & Alexandre, L. A. (2013). Minimum error entropy
classification. Springer.
[20.] Diday, E., Lechevallier, Y., Schader, M., Bertrand, P., &Burtschy, B. (Eds.). (2013). New
approaches in classification and data analysis. Springer Science & Business Media.
[21.] Digby, B. (2001). It's a World Thing. Oxford University Press, USA.
[22.] El Fangary, L. M. (2009, December). Mining Data of Buffalo and Cow Production in Egypt. In
Frontier of Computer Science and Technology, 2009. FCST'09. Fourth International
Conference on (pp. 382-387). IEEE.
[23.] Elder, J. (2009). Handbook of statistical analysis and data mining applications. Academic
Press.
[24.] Ellen, S. (2012). Slovin's Formula Sampling Techniques.
[25.] Fagerlund, S. (2007). Bird Species Recognition Using Support Vector Machines, EURASIP
Journal on Advances in Signal Processing 2007, Article ID 38637, 1–8.
[26.] Fasina, F. O., Wai, M. D., Mohammed, S. N., &Onyekonwu, O. N. (2007). Contribution of
poultry production to household income: a case of Jos South Local Government in Nigeria.
Family Poultry, 17(1&2), 30-34.
[27.] Gill, M. K., Asefa, T., Kemblowski, M. W., & McKee, M. (2006). Soil moisture prediction using
support vector machines. JAWRA Journal of the American Water Resources Association,
42(4), 1033-1046.
[28.] Hall, M., Witten, I., & Frank, E. (2011). Data mining: Practical machine learning tools and
techniques. Kaufmann, Burlington.
[29.] Han, J., Jian, P., & Michelin, K. (2006). Data Mining, Southeast Asia Edition.
[30.] Han, J., Pei, J., &Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
[31.] Heise, H., Crisan, A., &Theuvsen, L. (2015). The poultry market in Nigeria: market structures
and potential for investment in the market. International Food and Agribusiness Management
Review, 18, 197-222.
[32.] Herremans, E., Melado-Herreros, A., Defraeye, T., Verlinden, B., Hertog, M., Verboven, P.,
...&Wevers, M. (2014). Comparison of X-ray CT and MRI of watercore disorder of different
apple cultivars. Postharvest biology and technology, 87, 42-50.
[33.] Hill, T., Lewicki, P., &Lewicki, P. (2006). Statistics: methods and applications: a comprehensive
reference for science, industry, and data mining. StatSoft, Inc..
[34.] Hoffmann, J. P. (2016). Regression Models for Categorical, Count, and Related Variables: An
Applied Approach. Univ of California Press.
[35.] Issac, B., &Israr, N. (Eds.). (2014). Case Studies in Secure Computing: Achievements and
Trends. CRC Press.
[36.] Jones, J. W., Tsuji, G. Y., Hoogenboom, G., Hunt, L. A., Thornton, P. K., Wilkens, P. W., ... &
Singh, U. (1998). Decision support system for agrotechnology transfer: DSSAT v3. In
Understanding options for agricultural production (pp. 157-177). Springer Netherlands.
[37.] Jones, M. T. (2015). Artificial Intelligence: A Systems Approach: A Systems Approach. Jones
& Bartlett Learning.
[38.] Klösgen, W., &Zytkow, J. M. (2002). Handbook of data mining and knowledge discovery.
Oxford University Press, Inc.
[39.] Kondo, N., Ahmad, U., Monta, M., &Murase, H. (2000). Machine vision based quality
evaluation of Iyokan orange fruit using neural networks. Computers and electronics in
agriculture, 29(1), 135-147.
[40.] Kotsiantis, S. B., Zaharakis, I., &Pintelas, P. (2007). Supervised machine learning: A review of
classification techniques.
[41.] Larson, N. I., Story, M. T., & Nelson, M. C. (2009). Neighborhood environments: disparities in
access to healthy foods in the US. American journal of preventive medicine, 36(1), 74-
81.
[42.] Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
[43.] Moolayil, J. (2016). Smarter Decisions–The Intersection of Internet of Things and Decision
Science. Packt Publishing Ltd.
[44.] Moshou, D., Chedad, A., Van Hirtum, A., De Baerdemaeker, J., Berckmans, D., & Ramon, H.
(2001). An intelligent alarm for early detection of swine epidemics based on neural
networks. Transactions of the ASAE, 44(1), 167.
[45.] Moshou, D., Chedad, A., Van Hirtum, A., De Baerdemaeker, J., Berckmans, D., & Ramon, H.
(2001). Neural recognition system for swine cough. Mathematics and Computers in
Simulation, 56(4), 475-487.
[46.] Mucherino, A., Papajorgji, P. J., &Pardalos, P. M. (2009). Data mining in agriculture (Vol. 34).
Springer Science & Business Media.
[47.] Niu, G. (2017). Data-Driven Technology for Engineering Systems Health Management.
Springer.
[48.] Patan, K. (2008). Artificial neural networks for the modelling and fault diagnosis of technical
processes. Springer.
[49.] Piatetsky-Shapiro, G., & Parker, G. (2011). Lesson: Data mining, and knowledge discovery: An
introduction. Introduction to Data Mining, KD Nuggets.
[50.] Quinlan, J. R. (1987). Simplifying decision trees. International journal of man-machine studies,
27(3), 221-234.
[51.] Rajeswari, V., &Arunesh, K. (2016). Analysing soil data using data mining classification
techniques. Indian Journal of Science and Technology, 9(19).
[52.] Ramesh, D., &Vardhan, B. V. (2013). Data mining techniques and applications to agricultural
yield data. International Journal of Advanced Research in Computer and Communication
Engineering, 2(9), 3477-80.
[53.] Ramesh, D., &Vardhan, B. V. (2015). Analysis of crop yield prediction using data mining
techniques. International Journal of Research in Engineering and Technology, 4(1), 47-473.
[54.] Raut, A. B., &Nichat, M. A. A. (2017). Students Performance Prediction Using Decision Tree.
International Journal of Computational Intelligence Research, 13(7), 1735-1741.
[55.] Rokach, L., &Maimon, O. (2005). The Data Mining and Knowledge Discovery Handbook: A
Complete Guide for Researchers and Practitioners.
[56.] Rokach, L., &Maimon, O. (2008). Data mining with decision trees: theory and applications.
[57.] Rokach, L., &Maimon, O. (2014). Data mining with decision trees: theory and applications.
World scientific.
[58.] Ryan, T. P. (2013). Sample size determination and power. John Wiley & Sons.
[59.] Sadeghi, M., Banakar, A., Khazaee, M., &Soleimani, M. R. (2015). An Intelligent Procedure for
the Detection and Classification of Chickens Infected by Clostridium Perfringens Based
on their Vocalization. RevistaBrasileira de CiênciaAvícola, 17(4), 537-544.
[60.] Shahin, M. A., Tollner, E. W., & McClendon, R. W. (2001). AE—Automation and Emerging
Technologies: Artificial Intelligence Classifiers for sorting Apples based on Watercore.
Journal of agricultural engineering research, 79(3), 265-274.
[61.] Siau, K. (Ed.). (2008). Advanced Principles for Improving Database Design, Systems
Modeling, and Software Development. IGI Global.
[62.] Stockle, C. O., Martin, S. A., & Campbell, G. S. (1994). CropSyst, a cropping systems
simulation model: water/nitrogen budgets and crop yield. Agricultural Systems, 46(3),
335-359.
[63.] Sucar, L. E. (Ed.). (2011). Decision Theory Models for Applications in Artificial Intelligence:
Concepts and Solutions: Concepts and Solutions. IGI Global.
[64.] Sumathi, S., &Sivanandam, S. N. (2006). Introduction to data mining and its applications (Vol.
29). Springer.
[65.] Tan, P. N. (2006). Introduction to data mining. Pearson Education India.
[66.] Thuraisingham, B. (1998). Data mining: technologies, techniques, tools, and trends. CRC
press.
[67.] Tiwari, V., Tiwari, B., Thakur, R. S., & Gupta, S. (2013). Pattern and data analysis in
healthcare settings.
[68.] Tjoa, A. M., & Trujillo, J. (2010). Data Warehousing and Knowledge Discovery. Springer
Berlin/Heidelberg.
[69.] Tuffery, S. (2011). Data mining and statistics for decision making (Vol. 2). Chichester: Wiley.
[70.] Urtubia, A., Pérez-Correa, J. R., Soto, A., &Pszczolkowski, P. (2007). Using data mining
techniques to predict industrial wine problem fermentations. Food Control, 18(12), 1512-
1517.
[71.] Vale, M. M., Moura, D. J. D., Nääs, I. D. A., Oliveira, S. R. D. M., & Rodrigues, L. H. A. (2008).
Data mining to estimate broiler mortality when exposed to heat wave. Scientia Agricola,
65(3), 223-229.
[72.] Wang, J. (Ed.). (2008). Data warehousing and mining: Concepts, methodologies, tools, and
applications: Concepts, methodologies, tools, and applications (Vol. 3). IGI Global.
[73.] Wang, J. (Ed). (2014). Encyclopedia of Business Analytics and optimization. IGI Global.
[74.] Wendler, T., &Gröttrup, S. (2016). Data Mining with SPSS Modeler: Theory, Exercises and
Solutions. Springer.
[75.] Witten, I. H., Frank E. (2005). Data Mining: Practical machine learning tools and techniques, 2,
127-143.
[76.] Wolff, K. E., Palchunov, D. E., Zagoruiko, N. G., &Andelfinger, U. (Eds.). (2011). Knowledge
Processing and Data Analysis: First International Conference, KONT 2007, Novosibirsk,
Russia, September 14-16, 2007, and First International Conference, KPP 2007,
Darmstadt, Germany, September 28-30, 2007. Revised Selected Papers (Vol. 6581). Springer
Science & Business Media.
[77.] Wu, X., & Kumar, V. (2009). The top ten algorithm in data mining. International Standard Book,
13, 978-1.
[78.] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ...& Zhou, Z. H. (2008).
Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37