Solar power forecasting for the next day is required for utility-scale solar farms to profitably participate in the day-ahead electricity market [
1]. Participation in the day-ahead market requires accurate solar power forecasting for the next day because the expected amount of solar generation for the next day has to be offered today, and then the offered amount of solar energy must be dispatched the next day. The dispatched amount of solar generation should be within a certain range, which is specified by independent system operators (ISOs) on the basis of the hourly forecasted amount, so deviation above or below the range may lead to penalties or the risk that deviation must be sold at unattractively low prices or bought at unattractively high prices [
2]. In order to minimize the deviations and additional costs, the solar power for the next day can be predicted using the next day’s forecasted solar irradiation and NWP. The difficulty in solar irradiation forecasting is to interpolate NWP at multiple target solar farms using unsupervised interpolation techniques in case weather stations are not installed at exact target solar farm locations. In addition, the NWP is generally provided at integer grid points. Therefore, to forecast the solar power accurately, the NWP should be interpolated, and the relationship between the NWP and solar irradiation should be analyzed.
1.1. Literature Review
To respond to the issues mentioned above, solar forecasting has been studied for a long time. In this literature review, solar forecasting is introduced according to the various types of approaches: measurable physical properties, forecasting time steps and horizon, forecasting machines, and forecasting techniques. It is worth noting that the forecasting machines represent widely used machine learning algorithms such as support vector machine (SVM) and neural network (NN), and that the forecasting techniques represent methods to tailor input and output data sets, for example, the pre-processing, post-processing, and feature engineering techniques.
First, solar forecasting can be classified into solar irradiance (), solar irradiation (), which is the aggregated irradiances, and solar power (W) forecasting. In this study, the global horizontal irradiation (GHI), instead of the direct normal irradiation (DNI) and diffuse horizontal irradiation (DHI) are forecasted, but forecasting methodologies in this paper can also be used for DNI, DHI, or power forecasting.
Second, solar forecasting can be classified into hourly and daily solar forecasting. Most solar irradiation is forecasted hourly for a horizon from one hour to 48 h, based on solar irradiation or NWP values [
3]. Total daily solar irradiation was forecasted using the wavelet-network model, which is an NN that uses wavelets as a transfer function, to estimate the size of a photovoltaic power system [
4]. Total daily solar irradiation was also forecasted using the NN [
5]. Paoli et al. used the clearness index, which is the ratio between solar irradiation and the daily extraterrestrial solar irradiation. In addition, seasonal trends were subtracted to make the observation data stationary. However, since the predictors in this study have seasonal trends, seasonal trends in the target data are not separately designed. The seasonal trends in the predictions are automatically regressed from the seasonal trends in the predictors.
Third, various machine learning algorithms are used in solar forecasting. These include the time series models, NN, SVM, Markov chain, K-nearest neighbors, and others. For time series models, the autoregressive (AR) model was first used to forecast solar irradiance [
6]. Recently, the AR with exogenous input was proposed to forecast solar power [
7]. It was shown that the AR with observed solar power and the forecasted solar irradiance performs better than the AR model for solar power alone. Furthermore, an autoregressive integrated moving average (ARIMA) model was designed to forecast the GHI [
8]. The ARIMA model predicted a cloud index and then forecasted the GHI through a look up table of a cloud index based on a zenith angle and the predicted cloud index. It was shown that the cloud index improved the forecasting performance.
NN has been the most widely used forecasting machine in solar forecasting, and it was first used to forecast solar irradiance based on weather data [
9]. Furthermore, different types of NN, such as the feed-forward, recurrent Elman, and radial basis NN, were also used to forecast the hour ahead solar irradiance based on the cloud index [
10]. In this study, it was concluded that the feed-forward NN that was trained by the Levenberg–Marquardt algorithm and that received the cloud index as input had the best performance. Moreover, in order to reduce the dimension of NN input data, the Gamma test combined with a genetic algorithm was used to select weather data as the NN input [
11].
Solar forecasting can be further categorized according to additional techniques that are specialized for solar forecasting. The first technique is to use satellite images. Since the cloud index was first determined from satellite images in the 1980s [
12], satellite images have been used to design the global and direct irradiance model, cloud index model, and the conversion model between global and direct irradiance [
13]. These models were used to forecast the short-term solar irradiance forecasting in [
14] where the satellite-derived cloud motion model and hourly GHI were used. It was shown that the satellite-image derived model can improve the forecasting performance up to four hours ahead. However, according to [
15], the forecasted cloud index from satellite images is not accurate after five hours, so it is difficult to use in day-ahead forecasting. Therefore, in our application, the cloud cover in percent, which is forecasted from the NWP model, is represented as a cloud index.
Input data classification is a second technique that has been used to improve forecasting performance. For example, input data were classified into nine cases based on the DHI and GHI using the maximum-likelihood method with Gaussian distributions [
16]. It was shown that the DHI has a higher performance than the GHI as a classification factor. Day-ahead sky conditions were also classified by a self-organized map (SOM) with respect to the cloud index, and solar irradiance into cloudy, sunny, and rainy conditions [
17]. In this study, the radial basis NN was used to forecast the hourly solar power for the next day. It was concluded from both cases that input data classification improves the forecast accuracy.
Hybrid forecasting models that use multiple forecasting models have also been used in solar forecasting. The NN was combined with the wavelet model [
18]. Cao and Lin developed the diagonal recurrent wavelet neural network (DRWNN), which is a full recurrent NN of wavelet transfer functions without connections among hidden neurons. It was shown that the DRWNN outperforms other NN-based forecasting models. Furthermore, some hybrid models represent input training data as the sum of two variables. For example, in [
19], the NN and Autoregressive Moving Average (ARMA) models were combined to forecast hourly solar irradiation. They trained the ARMA with input training data, and then they trained the time delayed NN with residuals that are the difference between the input training data and the outputs from the ARMA. They showed that the hybrid model outperforms individual forecasting machines. However, in these hybrid models, the performance of target data segregation should be checked.
Additionally, many different types of data affect the solar irradiance and can possibly be considered in solar irradiance forecasting. First, water vapor affects solar irradiance [
20] and should be considered in solar forecasting. Second, aerosol information is important in solar irradiance forecasting since the DNI loses energy when it passes through aerosol. Forecasting aerosol compensates for the lost DNI and DHI [
21]. Third, the infrared data is important. According to [
22], dual band solar illuminance of a shortwave spectrum—the Ultraviolet (UV) band (0.29–0.7 nm) and the near infrared band (0.7–4.0 nm)—is important to model the solar irradiance on clear days. Therefore, in this study water vapor, aerosol, and dual-band irradiance data are used to forecast the solar irradiation.