Forecasting the equity premium: Do deep neural network models work?

1. Introduction

Equity premium forecasting is one of the core issues in financial research. It is closely related to many important financial issues, such as portfolio management, capital cost and market effectiveness (Rapach & Zhou, 2013; Rapach et al., 2010). However, the out-of-sample predictability is still controversial. For example, Welch and Goyal (2008) find that 14 popular predictive variables do not outperform the simple historical average (HA) of returns. However, Campbell and Thompson (2008) point out that equity premium is predictable out-of-sample by adding parameter constraints based on financial theory. Neely et al. (2014) also show that combining information from both macroeconomic variables and technical indicators using principal components analysis (PCA) performs significantly better than the historical average forecast.

Among methods for stock return prediction, traditional linear regression methods have been widely adopted, e.g., OLS (Ordinary Least Squares), LASSO (Least Absolute Shrinkage and Selection Operator, see Tibshirani, 2011), Ridge regression (Tikhonov,1998). However, literature applying nonlinear methods, especially deep learning, to extract information from the stock return time series is still limited (Bekiros et al.,2016; Gupta et al.,2018). The ability to extract and transform features from data, and to identify hidden nonlinear relations without relying on econometric assumptions and human expertise, makes deep learning much more attractive than other machine learning methods. On the other hand, the number of conditioning variables that are believed to have forecasting power for returns is large and continue to increase over the last five decades. The traditional methods are reaching their limits on handling a large number of conditioning variables, so more advanced statistical tools, such as deep learning can be a solution (Gu et al., 2018). As one of the most popular deep learning methods, Deep Neural Network (DNN) DNN does not require manual indicator selection and enables us to apply much more variables as inputs. In this paper, we apply DNN method to directly forecast the U.S. equity premium and compare the result with that of OLS regression method.

Specifically, following Neely et al.(2014), we compare the forecasting performance (measured by MSFEOS, R2OS, and MSFE-adjusted statistic) of the Ordinary Least Squares models using 28 input variables (OLS+28) with Deep Neural Network models using the same 28 input variables (DNN+28) and Deep Neural Network models using the same 28 factors and additional 14 variables (DNN+42). Next, following Kandel and Stambaugh (1996) and Welch and Goyal (2008), we use the out-of-sample forecasts to compute the Certainty Equivalent Return (CER) gain and Sharpe ratio for mean-variance investors who optimally allocate their wealth between equities and risk-free bills. Our results show that the OLS+28 model has a surprisingly poor performance over the out-of-sample period 2011:01-2016:12, which Neely et al.(2014) didn’t test due to data availability. In contrast, the two DNN models both have good performances. The R2OS of DNN models are near 3%, and the DNN models generate large and robust economic gains for investors with an annualized CER gain at around 3%. The monthly Sharpe ratio of DNN models substantially outperforms HA and OLS+28 model.

Our study contributes to the existing literature in three ways. First, to the best of our knowledge, we are the first to apply deep learning——one of the hottest IT technologies——to forecast equity premium in a finance academic paper. Unlike most of studies focusing on traditional econometric model, we introduce a nonlinear machine learning model to forecast equity premium. Our results show that DNN models can outperform HA models and OLS models. Especially, we find the poor predictive ability of OLS models during the period 2011:01-2016:12, which is beyond the period studied by Neely et al.(2014). However, the DNN models still work well in this period. Second, we test whether DNN models can incorporate more predictive information from additional 14 variables selected from existing finance literature. The results show that the forecasting performance of DNN can be improved by inputting more variables. These, in turn, verify the existing finance literature. Last but not least, our asset allocation results indicate that DNN models can be applied to practical investment management and produce a large number of economic values.

The rest of the paper is organized as follows. Section 2 presents the methodology and data. Section 3 discusses the empirical results. Section 4 concludes the paper.

2. Methodology and Data

2.1. HA model

Welch and Goyal (2008) argue that a that simple historical average(HA) forecasts equity premium better than regressions equity premium on predictors including 14 popular macroeconomic variables. So our first benchmark model is HA model, which can be expressed as follows:

(1) $$R_{t+1}=1/t ∑_{s=1}^tR_s$$

where Rt is the equity premium at month t.

2.2. OLS model

Based on PCA and OLS predictive regression framework, Neely et al. (2014) find that, compared with HA model, combining information from both 14 macroeconomic variables and 14 technical variables significantly improves equity premium forecasts. We repeat their study and define OLS models as follows:

(2) $$R_{t+1}=\alpha_i+\beta_i\ x_{i,t}+\varepsilon_{i,t+1}$$

where Rt+1 is the equity premium at month t+1, xi,t is the predictor i at month t. Based on data through t, we can get $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ from the OLS estimate of $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$. Then the out-of-sample forecast $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ is

(3) $$R_{t+1} =α_i+β_i x_{i,t}$$

Especially, we denote the OLS regression on principal components extracted from these 28 variables studied by Neely et al.(2014) as “OLS+28”model.

2.3. DNN model

Our DNN models have the following general equations:

(4) $$x_1^{(l)}=ReLU[BN(x^{(0)})θ_1^{(0)}]$$

(5) $$x_n^{(l)}=ReLU(BN(x^{(l-1)}θ_n^{(l-1)})$$

(6) $$R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$$

where N(l)denotes the number of neurons in each layer $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$. We define the output of neuron n in layer l as $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ and the vector of outputs for this layer (augmented to include a constant, $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ as $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$. The number of units in the input layer is equal to the dimension of the variables, and let $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$, where xm is the m-th input variable. Let $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ denote weight and bias parameters in each layer $R̂_{t+1}=x^{(N^{(l)}-1)}θ^{(N^{(l)}-1)}$ is the forecast of log equity premium at month t+1. Rectified linear unit (ReLU) is the most popular activation function (Nair and Hinton, 2010) and we use this at all nodes. Batch normalization (BN) is a simple regularization technique for controlling the variability of variables across different regions of the network and across different datasets (Nair and Hinton, 2010). Equation states the relationship between the input variables in input layer and the output vectors in the first hidden layer. Equation shows the recursively output formula for the neural network at each neuron in layer l. And equation gives the final output of forecasting results. For comparing with HA and OLS+28 models, we first apply the same 28 variables as input to the OLS+28 model and DNN+28 model. Then, in order to examine whether DNN models can extract information from the 14 additional predictors to improve the forecast performance, we add 14 additional variables selected from existing finance literature and obtain the DNN+42 models.

At present, there is no uniform approach to determine the best parameters such as the number of layers and neurons for DNN on a given problem. Since Gu et al. (2018) suggest that shallow learning outperforms the relatively deeper learning, we choose three or four hidden layers to start search in our study. To solve this nonlinearity and nonconvexity problem, we use the adaptive moments method (Adam, Kingma, et al. 2014) to train our DNN models and grid search method to select the best one. Finally, DNN+28 models take 200, 200, 200, and 128 neurons in four hidden layers and 0, 0.5, 20 as the values of the weight decay of Adam, dropout probability and epochs respectively. For DNN+42 models, these values are 600, 300, 300 in three hidden layers and 0, 0.5, 10, respectively. For robustness check, we will discuss the effect of those key parameters on forecasting performance.

DNN models tend to suffer from overfitting when tuning parameters to achieve satisfactory results. Four methods are applied to prevent overfitting: First, we shrink the weight parameters of DNN model via L2 penalized estimation method, because the method can control the weight of regularization term in the loss function. Second, we apply dropout technique to prevent overfitting and co-adaptations of neurons, and set the output of any neuron to zero with probability p. Models with dropout can be interpreted as an ensemble of models with different numbers of neurons in each layer, but also with weight sharing, and thus can enhance generalization ability (Srivastava et al. 2014). Third, early stopping method is adopted to determine the best training epoch. And we stop training once the model performance stops improving on test datasets. Finally, we use the batch normalization algorithm, which normalizes the input of each layer to ensure that the input data of each layer is stable, thus achieving the purpose of speeding up training and improving generalization ability.

2.4. Forecast Evaluation Measures

Following Neely et al.(2014) and Welch and Goyal (2008), we employ two kinds of forecast evaluation measures. First R2OS and MSFE-adjusted Statistics. R2OS measures the forecasting accuracy versus benchmark HA model and a monthly R2OS of 0.5% is economically significant (Campbell and Thompson ,2008). MSFE-adjusted statistic measures the statistical significance (Clark and West, 2007). Second, Asset Allocation Performance measured by following six measures: (1) certainty equivalent return gain [CER gain, △(ann%)], (2) CER gain in expansions [△(ann%), EXP], (3) CER gain in recessions [△(ann%), REC], (4) Sharpe ratio, (5) Relative average turnover, (6) CER gain with 50bps per transaction [△(ann%), cost = 50bps].

2.5. Data

The dataset used covers the monthly period from 1950:12 to 2016:12, based on data availability. The equity premium Rt is computed as the difference between the log return on the S&P 500 (including dividends) and the log return on a risk-free bill. As mentioned before, in order to compare the forecasting performance of our considered models, we select 48 predictors. These consist of three groups: 14 macroeconomic variables from Welch and Goyal (2008), 14 technical variables from Neely et al. (2014), and 14 additional variables from existing finance literatures including investors sentiment changes (Wurgler and Baker, 2006), financial stress index (Cardarelli et al.,2011), ratio of 52-week high (George & Hwang, 2004), etc.

Table 1 reports the summary statistics for the log equity premium (1950:12-2016:12), macroeconomic variables (1950:12-2016:12), technical variables (1950:12-2016:12), and additional variables (1965:08-2016:12). The average monthly equity premium (0.004) divided by its standard deviation (0.043) produces a monthly Sharpe ratio value of 0.088. Most of the macroeconomic variables and additional variables are strongly auto-correlated.

3. Empirical results

Similar to Neely et al.(2014), these models are estimated in-sample using recursively expanding windows with an initial length of 15 years. We divide out-of-sample period into three panels: panel A (1966:01-2011:12), panel B (1980:09-2010:12), and panel C (2011:01-2016:12). We report results in each panel for the whole period along with NBER-date business-cycle expansions and recessions period.

Mean Median Std Min Max Auto-cor Skewness Kurtosis
Panel A: Log equity premium, December 1950 to December 2016
R 0.004 0.008 0.043 -0.248 0.149 0.049 -0.669 2.535
Panel B: Macroeconomic variables, December 1950 to December 2016
DP -3.602 -3.531 0.412 -4.524 -2.753 0.994 -0.134 -0.872
DY -3.597 -3.525 0.412 -4.531 -2.751 0.994 -0.139 -0.848
EP -2.831 -2.860 0.449 -4.836 -1.899 0.989 -0.723 2.648
DE -0.771 -0.815 0.320 -1.244 1.379 0.986 2.961 15.854
RVOL 0.145 0.135 0.051 0.055 0.316 0.963 0.799 0.549
BM 0.498 0.414 0.270 0.121 1.207 0.994 0.761 -0.465
NTIS -0.010 -0.013 0.020 -0.051 0.058 0.979 0.650 0.265
TBL -4.866 -4.970 3.275 -16.300 -0.010 0.990 -0.527 0.596
LTY -6.772 -6.460 2.683 -14.820 -1.750 0.993 -0.589 0.132
LTR 0.639 0.510 3.054 -11.240 15.230 0.037 0.380 2.275
TMS 1.905 2.060 1.507 -3.650 4.550 0.955 -0.464 -0.172
DFY 1.062 0.940 0.448 0.320 3.380 0.964 1.754 4.229
DFR 0.011 0.060 1.498 -9.750 7.370 -0.064 -0.348 6.146
INFL -0.330 -0.305 0.359 -1.792 1.915 0.619 0.160 3.458
Panel C: Technical variables, December 1950 to December 2016
MA(1,9) 0.677 1 0.468 0 1 0.703 -0.761 -1.425
MA(1,12) 0.708 1 0.455 0 1 0.780 -0.919 -1.160
MA(2,9) 0.684 1 0.465 0 1 0.748 -0.793 -1.375
MA(2,12) 0.705 1 0.456 0 1 0.821 -0.901 -1.191
MA(3,9) 0.686 1 0.465 0 1 0.785 -0.801 -1.362
MA(3,12) 0.703 1 0.457 0 1 0.817 -0.893 -1.207
MOM(9) 0.703 1 0.457 0 1 0.767 -0.893 -1.207
MOM(12) 0.728 1 0.445 0 1 0.804 -1.026 -0.951
VOL(1,9) 0.666 1 0.472 0 1 0.609 -0.706 -1.506
VOL(1,12) 0.687 1 0.464 0 1 0.709 -0.809 -1.349
VOL(2,9) 0.660 1 0.474 0 1 0.761 -0.675 -1.549
VOL(2,12) 0.690 1 0.463 0 1 0.825 -0.826 -1.322
VOL(3,9) 0.676 1 0.468 0 1 0.770 -0.753 -1.437
VOL(3,12) 0.682 1 0.466 0 1 0.835 -0.785 -1.388
Panel D: Additional variables, August 1965 to December 2016
PDND -4.658 -6.194 13.58 -50.23 31.632 0.970 0.147 0.147
RIPO 16.808 12.700 19.44 -28.80 119.10 0.648 2.112 6.403
NIPO 25.916 19.000 23.23 - 122.00 0.862 1.203 1.079
CEFD 8.674 9.220 7.343 -10.91 25.28 0.962 -0.124 -0.327
S 0.172 0.151 0.086 0.045 0.430 0.994 0.946 0.348
ΔSENT 0.001 0.032 0.942 -3.616 5.416 0.086 0.289 2.882
FS 100.77 100.74 0.894 98.359 105.89 0.857 0.621 2.229
WH52_Ratio 0.936 0.965 0.083 0.51 1.04 0.891 -1.858 3.915
WH52_Abs 0.154 0.000 0.361 0.00 1.00 0.079 1.922 1.700
DV 0.010 0.009 0.003 0.01 0.02 0.997 0.649 -0.287
WV 0.009 0.009 0.002 0.00 0.01 0.998 0.128 -0.778
AV 0.009 0.009 0.003 0.01 0.02 0.992 0.592 -0.595
VAR005 0.060 0.058 0.015 0.03 0.08 0.980 0.024 -1.063
VAR001 0.078 0.080 0.020 0.04 0.11 0.981 -0.191 -1.054
Table 1. Summary Statistic These popular 14 macroeconomic variables including log of dividend-price ratio[log(DP)], log of Dividend yield[log(DY)], log of earnings-price ratio[log(EP)], log of dividend-payout ratio[log(DE)], equity risk premium volatility(RVOL), book-to-market ratio(BM), net equity expansion(NTIS), treasury bill rate(TBL), long-term yield(LTY), long-term return(LTR), term spread(TMS), default yield spread(DFY), default return spread(DFR), inflation(INFL). These 14 technical variables based on three popular trend-following strategies: moving average variables (MA), momentum variables (MOM), and volume variables (Vol). We choose different parameters and obtain 14 technical variables: MA(1,9), MA(1,12), MA(2,9), MA(2,12), MA(3,9), MA(3,12), MOM(9), MOM(12), VOL(1,9), VOL(1,12), VOL(2,9), VOL(2,12), VOL(3,9), VOL(3,12), and all these 14 technical variables are binary variable. These 14 additional variables including dividend premium (PDND), number of IPOs (RIPO), average first-day returns (NIPO), closed-end fund discount(CEFD), equity share in new issues (S), sentiment changes (ΔSENT), financial stress (FS), ratio of 52-week high (WH52_Ratio), absolute of 52-week high (WH52_Abs), daily volatility (DV), weekly volatility (WV), annual volatility (AV), 5% of the quantile in the past 60 months (VAR005), 1% of the quantile in the past 60 months (VAR001).

3.1. In-sample test results

Table 2 reports the in-sample results of HA, OLS+28, DNN+28, DNN+42 models for the three panels. The results in Panel A, show that OLS+28 models can beat the HA models in terms of MSFE and R2, which is consistent with Neely et al.(2014). Overall, the in-sample results of DNN models outperform HA and OLS+28 models on all the three panels, which is not affected by business-cycle expansions or recessions.

Model (%) .EXP (%) REC (%)
Panel A: January 1966 to December 2011
HA 20.23
OLS+28 15.15 0.05 0.42 0.52
DNN+28 15.47 3.03 1.08 5.48
Panel B: September 1980 to December 2010
HA 20.54
OLS+28 16.24 0.04 0.29 0.41
DNN+28 15.47 3.03 1.08 5.48
DNN+42 18.56 3.72 0.50 6.96
Panel C: January 2011 to December 2016
HA 10.67
OLS+28 17.49 1.81 1.81 -
DNN+28 17.37 2.62 2.62 -
DNN+42 18.59 3.81 3.81 -
Table 2. In-Sample Test Results This table reports the in-sample performance of various measures of forecast models estimated using recursively expanding windows with 180 initial months. MSFEIS is the in-sample mean squared forecast error. measures the mean of the percent reduction in mean squared forecast error (MSFE) for the given forecast model relative to the historical average benchmark forecast. statistics are also calculated separately for NBER-dated expansions (EXP) and recessions (REC).

3.2. Out-of-Sample forecasting results

Table 3 provides the out-of-sample forecasting results of models. From Panel A of Table 3, in terms of R2OS and MSFEOS, the OLS+28 model outperforms the HA model from 1966:01 to 2011:12, which have almost the same results as those of Neely et al. (2014). However, Panel B shows that the performance of OLS+28 model in each panel is worse than the HA model since 1980:09. This means that the OLS+28 model only performs better than the HA model in the former 15 years. Besides, the OLS+28 model obtains significantly large positive R2OS (11.37%, 10.64% in panel A and panel B, respectively) during recessions, but disappointingly negative R2OS during expansions (-2.63%, -4.14% in panel A and panel B, respectively). This suggests that the OLS+28 model’s strong performance on the whole sample is largely due to high R2OS values during recessions. From Panel C, it further shows that, surprisingly, the OLS+28 model displays no out-of-sample predictive ability in terms of R2OS (-5.02%) from 2011:01 to 2016:12, a period that has not been examined by Neely et al.(2014). Overall, the OLS+28 model does not have good predictive robustness.

Turning to our proposed DNN models, the results in Table 3 show that both DNN+28 and DNN+42 model strongly beat the simple HA benchmark and the OLS+28 model in terms of MSFE and R2OS. The out-of-sample MSFEs for DNN models are significantly less than that of HA and OLS+28 at the conventional confidence level. Impressively, it is worth pointing out that the R2OS statistics of DNN models overwhelmingly beat the OLS+28 model and are positive in each panel. These indicate that DNN models can outperform the HA model both in expansions and recessions, and have good robustness.

Moreover, it shows that, overall, the performance of DNN+42 model are relatively better than the DNN+28 models. Especially, the DNN+42 model has an R2OS of 3.37% in Panel B of Table 2, which significantly exceeds the R2OS of 1.49% of DNN+28 model. The out-of-sample MSFEs of DNN+42 model are much less than that of HA models at the 1% confidence level. Thus, the results suggest that the forecasting performance of DNN modes is enhanced by incorporating 14 additional variables.

Model MSFEOS R2OS (%) MSFE-adjusted R2OS EXP (%) R2OS REC (%)
Panel A: January 1966 to December 2011
HA 20.23
OLS+28 19.83 1.95 3.38*** -2.63 11.37
DNN+28 19.67 2.75 3.60*** 1.02 6.31
Panel B: September 1980 to December 2010
HA 20.54
OLS+28 20.55 -0.07 2.03** -4.14 10.64
DNN+28 20.23 1.49 2.23** 0.48 4.15
DNN+42 19.85 3.37 2.58*** 1.06 9.41
Panel C: January 2011 to December 2016
HA 10.67
OLS+28 11.20 -5.02 -0.53* -5.02 -
DNN+28 10.31 3.35 2.06** 3.35 -
DNN+42 10.30 3.42 1.85** 3.42 -
Table 3. Out-of-Sample Forecasting Results This table reports the out-of-sample performance of various measures of forecast models estimated using recursively expanding windows with 180 initial months. MSFEOS is the out-of-sample mean squared forecast error. R2OS measures the percent reduction in mean squared forecast error (MSFE) for the given forecast model relative to the historical average benchmark forecast. MSFE-adjusted is the Clark and West (2007) statistic for testing the null hypothesis that the historical average forecast MSFE is less than or equal to the competing forecast MSFE against the one-sided (upper-tail) alternative hypothesis that the historical average forecast MSFE is greater than the competing forecast MSFE. *, **, *** indicate significance at the 10%, 5% and 1% levels, respectively. R2OS statistics is also reported separately for NBER-dated expansions (EXP) and recessions (REC). The out-of-sample evaluation periods vary in terms of different panels.

3.3. Asset allocation results

Table 4 reports the portfolio performance for asset allocation over 1966:01-2016:12. In accord with the R2OS in Table 1, the OLS+28 model does not uniformly get robustness performance in terms of △(ann%), △(ann%), EXP, and △(ann%), REC in Table 4.

Turning to the performance of DNN models, Table 4 shows that CER gains in both recessions and expansions are positive. Besides, though the turnover is relatively high compared with HA and OLS+28 models, the CER gains with a proportional transactions cost of 50 basis points per transaction are still positive. From the perspective of asset allocation, the DNN+28 models also obtain good performance. Table 4 consistently confirms that the DNN+42 model outperforms the DNN+28 model in terms of CER gain and Sharpe ratio. DNN+42 models generate monthly out-of-sample R2 of 3.42% and annual utility gain of 2.99% for a mean-variance investor from 2011:1 to 2016:12. The asset allocation analysis demonstrates a substantial economic value of employing DNN models for equity premium forecasting.

Model △(ann%) △(ann%),EXP △(ann%), REC Sharpe ratio Relative average turnover △(ann%), cost =50 bps
Panel A: January 1966 to December 2011
HA(CER) 4.87 9.33 -17.52 0.06 2.66% 4.70
OLS+28 5.07 0.05 30.33 0.16 6.43 4.20
DNN+28 4.40 1.46 18.99 0.14 13.64 2.37
Panel B: September 1980 to December 2010
HA(CER) 7.12 11.54 -17.61 0.10 2.63% 6.95
OLS+28 2.77 -1.57 26.96 0.16 5.18 2.09
DNN+28 2.49 1.13 9.90 0.15 14.36 0.37
DNN+42 4.48 0.32 27.65 0.20 19.85 1.48
Panel C: January 2011 to December 2016
HA(CER) 8.35 8.35 - 0.26 2.31% 8.21
OLS+28 -4.56 -4.56 - 0.16 12.50 -6.19
DNN+28 2.88 2.88 - 0.31 7.78 1.95
DNN+42 2.99 2.99 - 0.33 16.52 0.84
Table 4. Portfolio Performance Measures (Risk aversion coefficient = 3) This table reports the portfolio performance measures for a mean-variance investor who allocates capital monthly between equities and risk-free bills using the monthly out-of-sample forecast results of the U.S. equity premium based on different forecast models. The utility gain △(ann%) is the annualized certainty equivalent return gain for the investor with risk aversion coefficient of three. △(ann%) statistics are also reported separately for NBER-dated expansions (EXP) and recessions (REC). The monthly Sharpe ratio is the mean portfolio return in excess of the risk-free rate divided by its standard deviation. The out-of-sample evaluation periods are varies in terms of different panels. Relative average turnover is the average turnover for the portfolio based on the model forecast divided by the average turnover for the portfolio based on the historical average forecast. The △(ann%), cost=50bps is the CER gain assuming a proportional transactions cost of 50 basis points per transaction. For comparison, the historical average model [HA(CER)] is the annualized certainty equivalent return, using as a benchmark for other models to compute the utility gain.

3.4. Robustness checks

To further validate our results, we conduct the following robustness checks. First, the effects of the number of DNN models’ epochs, dropout probability and weight decay on forecasting performance are displayed in Figure 1. It shows that these key parameters have good performance near the optimal value. Second, we report the out-of-sample forecasting results year by year for our models (Table A11 in the Online Appendix). Finally, we check the results of asset allocation exercise with risk aversion coefficients equal to 4,5,6 (Table A4 – Table A10 in the Online Appendix). Overall, these robustness checks confirm that DNN models indeed work better than HA models and OLS models for forecasting equity premium.

Figure 1. Effects of the Number of DNN Models’ Epochs, Dropout Probability and Weight Decay on and CER Gains in Panel C

Figure 1.Panel A

Figure 2.Panel B

Figure 3.Panel CThese figures depict how and delta CER change with the number of epochs, dropout probability and weight decay applied to DNN models in panel C (2011:01-2016:12). measures the percent reduction in mean squared forecast error (MSFE) for the given forecast model relative to the historical average benchmark forecast. The Delta CER [△(ann%)] measures the annualized certainty equivalent return gain for the investor with risk aversion coefficient of three for DNN models.

4. Conclusion

This study compares the predictive ability of deep neural network models with that of ordinary least squares models and historical average models. We find that DNN models robustly work the best and significantly outperform both OLS and HA models in both in- and out-of-sample tests and asset allocation exercises. Moreover, the forecasting performance of DNN is enhanced by adding 14 additional variables selected from finance literature, which indicates that the DNN comprehensively incorporates the predictive information contained in these variables. One possible explanation for their excellent performance is that the nonlinear DNN successfully extract high dimension features from data automatically and discover different forecasting pattern in data. Our study is of great significance to portfolio construction and risk management for investors.

Supplementary Materials: Online Appendix is available from the authors.

Author Contributions: Conceptualization, Xianzheng Zhou, and Huaigang Long.; methodology, Xianzheng Zhou.; software, Xianzheng Zhou; validation, Xianzheng Zhou., Huaigang Long, and Hui Zhou.; formal analysis, Xianzheng Zhou; investigation, Xianzheng Zhou; resources, Hui Zhou; data curation, Hui Zhou ; writing—original draft preparation, Xianzheng Zhou; writing—review and editing, Huaigang Long; visualization, Hui Zhou; supervision, Huaigang Long; project administration, Huaigang Long; funding acquisition, Huaigang Long. All authors have read and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Data Availability Statement: The processed data from this study are available upon request.

Conflicts of Interest: The authors declare no conflict of interest.