Crop Yield Prediction
Anantapur, AP,India Rajampet, AP,India
Abstract—Agriculture is the pillar of the Indian economy so the estimation and monitoring of crop production is
and more than 50% of India’s population are dependent on necessary [4]. Accordingly, an appropriate method needs to
agriculture for their survival. Variations in weather, climate, be designed by considering the affecting features for the
and other such environmental conditions have become a major better selection of crops with respect to seasonal variation
risk for the healthy existence of agriculture. Machine learning [5].
(ML) plays a significant role as it has decision support tool for
Crop Yield Prediction (CYP) including supporting decisions on The core objective of crop yield estimation is to achieve
what crops to grow and what to do during the growing season higher agricultural crop production and many established
of the crops. The present research deals with a systematic models are exploited to increase the yield of crop production.
review that extracts and synthesize the features used for CYP Nowadays, ML is being used worldwide due to its efficiency
and furthermore, there are a variety of methods that were in various sectors such as forecasting, fault detection, pattern
developed to analyze crop yield prediction using artificial recognition, etc. The ML algorithms also help to improve the
intelligence techniques. The major limitations of the Neural crop yield production rate when there is a loss in unfavorable
Network are reduction in the relative error and decreased conditions. The ML algorithms are applied for the crop
prediction efficiency of Crop Yield. Similarly, supervised selection method to reduce the losses crop yield production
learning techniques were incapable to capture the nonlinear irrespective of distracting environment.
bond between input and output variables faced a problem
during the selection of fruits grading or sorting. Many studies The existing model used SVM that classified the crop
were recommended for agriculture development and the goal data based on the texture, shape, color of patterns on the
was to create an accurate and efficient model for crop diseased surface as it includes an unambiguous perception of
classification such as crop yield estimation based on the the defects [6]. An existing technique used CNN that reduced
weather, crop disease, classification of crops based on the the relative error as well as decreased the prediction of crop
growing phase etc., This paper explores various ML techniques yield [7]. Similarly, the existing model used Back
utilized in the field of crop yield estimation and provided a Propagation Neural Network (BPNNs) with the time series
detailed analysis in terms of accuracy using the techniques. model and used smaller dataset size gained lower
Keywords—Agriculture, Artificial Neural Network,
performance as less number of sample was used for
Convolution Neural Network, Crop yield prediction, Machine prediction [8], [9]. ML methods were applied in the field of
learning method. stability of selection and greater precision. ML provides
several effective algorithms which are used to find the input
I. INTRODUCTION and output connection in yield and crop prediction. There are
various machine techniques used in agriculture for yield
Agriculture is the backbone of India’s economy since its
prediction, smart irrigation system, Crop disease prediction,
plays a vital role in the survival of every human and animal
crop selection, weather forecasting, deciding the minimum
in India [1]. The worldwide population was estimated at 1.8
support price, etc. These techniques will enhance the
billion in 2009 and is predicted to increase to 4.9 billion by
productivity of the fields along with a reduction in the input
2030, leading to an extreme increase in demand for
efforts of the farmers. Besides, the advances in machines and
agricultural products. In the future, agricultural products will
technologies were accurate as they used significant data and
have higher demand among the human population, which
played an important role. [10]. This research work analyses
will require efficient development of farmlands and growth
the various agricultural methods that utilize ML, along with
in the yield of crops. Meanwhile, due to global warming, the
the merits and limitations.
crops were frequently spoiled by harmful climatic situations
[2]. A single crop failure due to lack of soil fertility, climatic This research paper is structured as follows: the stepwise
variation, floods, lack of soil fertility, lack of groundwater process on crop yield analysis is explained in Section 2. The
and other such factors destroy the crops which in turn affects analysis of several ML methods used to examine Crop yield
the farmers. In other nations, the society advises farmers to prediction is given in Section 3. The objectives and problem
increase the production of specific crops according to the statement of crop yield prediction are shown in 4 and 5 and
locality of the area and environmental factors [3]. The comparative analysis of several types of research are shown
population has been increasing at a significantly higher rate,
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
in Section 6. Section 7 describes the conclusion and future information, nutrients, field management etc. which are used
work. to perform the classification using ML algorithms. The
results obtained by the existing models using ML algorithms
II. BLOCK DIAGRAM are effectively described in the following section. Figure 1
The steps that are involved in crop yield prediction using shows the flow diagram of the crop yield prediction using
machine learning methodology are stated as follows. Firstly, ML algorithms.
the agriculture Data is utilized for the crop yield prediction,
Next, the data is undergone for pre-processing to remove the
noisy data. The pre-processed data is undergone for feature
extraction process that includes features such as soil
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
Fuentes et al. [11] utilized the Robust Deep-Learning turmeric. However, the range was low for other crops such as
method to identify the pest infestation and tomato plant wheat, rice, etc.
infections in crops. The existing model faced a problem for
crop yield prediction due to the presence of pests and Pandith et al. [16] utilized the calculation of ML
diseases in crops which substantially gave rise to economic technology for estimation of mustard crop yield from soil
loss. The developed model introduces a deep meta- review. In agriculture, the soil is a significant factor for
architecture to predict the pests in plants. The developed determining crop yield calculation and it was overcome by
model considers three key features of indicators: Single Shot developing an ML technology. Several ML techniques were
Multibox Detector (SDD), Faster region-based CNN and implemented to forecast mustard crop yield in advance from
Region-Based Fully CNN, which is known as deep meta- soil exploration, the techniques named multinomial logistic
architecture. The execution of the deep meta-architecture regression, K-nearest neighbor (KNN), ANN, random forest,
and feature extractors furthermore suggested a method for a Naive Bayes. An advantage of the developed model was that
global and local period explanation. The data growth yield prediction was performed even in presence of fertilizer
increases the precision and also reduced the number of false that also is implemented to support the soil analysis and
positives in training. The benefit of the developed model was farmers to take judgment accordingly in situations of low
crop yield prediction. However, the developed model crop
it successfully identified different kinds of pests and diseases
by dealing with complex situations from a nearby area. Due yield prediction with an enormous soil dataset was difficult
to the usage of complex pre-processing techniques, the in a big data environment that showed system complexity.
robust deep learning method consumes more time and high P.S. Maya Gopal and R. Bhargavi [17] developed a
computational price. novel approach for an effective CYP. The crop yield was
Sun et al. [12] utilized the Deep CNN-LSTM method to predicted using ANN, statistical and Multi Linear Regression
predict the soybean yield estimation. The Yield prediction (MLR) algorithms. The model examined the intrinsic
was an immense consequence for yield mapping, harvest behaviour that integrated MLR-ANN model for CYP that
management, crop insurance, crop market planning, and analyse the accuracy based on the coefficient generated from
remote sensing. The developed CNN-LSTM approach MLR and ANNs input layer weights and bias. The Feed
improved its practicability and feasibility in order to forecast forward ANN with back propagation model was used for
the Particulate Matter(PM2.5) concentration was also predicting the crop yield. Similarly, Khaki, S., & Wang, L
verified in the model. The DNN structure was developed that [18] studied about the DNN for CYP for determining an
integrated LSTM and CNN based on the historical data such accurate yield prediction model. The model performed
as cumulated wind speed, duration of rain, and concentration fundamental understanding for setting up the relation among
of PM 2.5The latest research in this area recommended that the yield and the interactive factors with respect to the
CNN could explore more spatial features and LSTM can powerful and comprehensive algorithm. The results showed
reveal phonological features, which together play a and suggested that the regression trees outperformed better
significant role in crop yield prediction. However, the when compared with existing supervised models. However,
method employed histogram-based tensor alteration fused the main limitation was to look for more advanced models
different remote sensing data which combined multisource were not showing accurate results.
data with a various resolution for feature extraction remained T. Vijayakumar [19] studied Posed Inverse Problem
challenging, Rectification Using Novel Deep CNN. The existing
Bondre and Mahagonkar [14] utilized ML techniques to methodologies showed an excellent outcome, but imposed
predict the crop yield and manure recommendation. The challenges in terms of computational cost, parameter
yield prediction was a major issue in agriculture which was selection for adjoint operators and forward operators. The
overcome by developing a machine learning algorithm. The developed model used CNN directly was inverted found a
performance of the developed model was evaluated for solution for solving the convolution inverse problem. The
estimating crop production in agriculture. An advantage of developed model utilized physical model for analyzing direct
the developed model was that earlier data was utilized for inversion, but the combination of multi-resolution
decomposition and the combination of residual learning led
crop prediction and by applying ML algorithms like random
forest and SVM the data also recommended a suitable to artifact generation. Therefore, the model was declined as
fertilizer for every particular crop. However, the smart the noise level was high.
irrigation system for farms to get a higher yield method was T. Senthil Kumar [20] developed a data mining-based
not implemented. marketing decision support system using hybrid ML
Devika and Ananthi [15] utilized data mining techniques techniques that solves the problem respective finance and
to predict the annual yield of major crops. Farmers were marketing applications. The decision making is done based
opposed to harvesting the yield because of insufficient on the decision support system which enhanced the
availability of water sources and unpredictable weather organization performance that analyses the ground reality. In
variations but these issues were overcome by developing a the existing models, globalization, privatization, and
data mining method. The developed model was gathering liberalization dragged the organization more competitively.
crop growing documents that used to be stored and analyzed The competition is balanced and withstand for achieving
for valuable crop yield prediction. In some of the data mining marketing strategies planned, executed properly. However,
actions, the training data can be collected from the previous an optimization model was required for the model as it posed
documents and the gathered documents were used in the difficulty during the process and showed lowered assessment
phase of training which has to exploit. An advantage of the performance.
developed model was that the highest level of crop yield By analyzing the studies, various feature groups related
prediction was obtained only in sugarcane, cotton, and with soil information such as soil maps, soil type, and area
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
of production were discussed. The soil maps will give an 2. ML technique used for Crop yield prediction (mustard,
information related to type of nutrients present in soil and wheat) combined input and output data but failed to
also location of soil found. The features related to crop obtain better results statistically
information is about the crops such as mustard crops, wheat,
rice, tomato plants etc, were analysed in terms of crop 3. Due to the nature of linear connection in the parameters,
density, growth process in terms of weight, leaf area index. the regression model was failed to provide the exact
Similarly, weather features such as humidity, rainfall, prediction in a complex situation such as extreme value
precipitation and forecaster rainfall. Based on these data and nonlinear data.
environmental factors, the nutrients components play an 4. The existing K-NN models were used for classification
important role. The nutrients include, Nitrogen, potassium, for yield prediction but lowered the performance due to
magnesium, zinc, boron etc., The solar information includes nonlinear and highly adaptable issues present in KNN.
features related with the temperature and radiation (gamma), They were operated in a locality model that incremented
shortwave radiation, solar radiation, degree days are utilized the dimensionality of the input vector made confusion for
for calculation of features. The features used are less classification.
including wind speed, images, and pressure are calculated.
5. An appropriate decision was not taken during
Pseudo code for CYP using ML classification because a fewer quantity of data was
Learning phase: available for estimation of crop yield.
For every unknown instance xn 1. Depending on the dissimilar crop feature divisions, the
modulating factor values of ML algorithms differ to
Identify x1,x2,..xn which are the most best attain perfect approximation.
instances obtained using ML algorithms from
data set are the data points 2. When the quantity of input elements is reduced, ANN is
utilized. The optimal feature was being empirically
Set class label until it is equal to the most selected for appropriate crop yield estimation.
repeated class
3. The advantage of ML method regression is to avoid
Return class; difficulties of using a linear function in large output
sample space and optimization of complex problems
End for
transformed into simple linear function optimization.
IV. PROBLEMS FACED IN EXISTING RESEARCHES 4. ML algorithm can be executed with an enormous soil
The problems faced in existing research for crop yield dataset for crop yield estimation.
prediction using machine learning are stated below: 5. The ML techniques, through observation of the
1. Creation, repair and maintenance of ML algorithms agricultural fields, provided the necessary support to the
required huge costs as they are very complex. farmers in increasing crop production to a great extent.
Gopal and ANN and The developed model is a The developed model showed difficulties in MLR
Bhargavi Multiple Linear combination of backpropagation training the neural network model RMSE=9.8%
[17] Regression algorithm with ANN to evaluate the MAE=6.9%
(MLR). exact crop yield. R=89%
Proceedings of the Fifth International Conference on Intelligent Computing and Control Systems (ICICCS 2021)
Khaki and Deep Neural The DNN model was performed for The developed model had a black box which was Training RMSE=10.55
Wang [18] Network (DNN) the feature selection. Next, the DNN shared through several ML methods Validation
model has reduced the measurement RMSE=12.79
of input space without affecting the
