Human capital in asset pricing: A machine learning perspective on the six-factor model for Pakistan's equity market

1. Introduction

Understanding the mechanics of asset pricing has been a central focus of economics and finance literature for decades. Fama and French (1992, 2015); Markowitz (1952); and Sharpe (1964), all argue that investment decisions follow two key principles: efficient resource allocation and consideration of the risk-return trade-off. Therefore, for capital allocation decisions, pricing assets becomes a key focus for the investors, whether firms or individuals. Additionally, Fama (1970) links the level of market efficiency to the extent of information reflected in asset prices. Therefore, asset prices in a weak, less transparent market will not reflect all information promptly, as in developed markets (Fan et al., 2011).

However, market efficiency is not the only concern in the asset-pricing problem; one must also understand the factors that impact asset prices. For example, Sharpe (1964) argues for a single (market) factor that explains prices of individual stocks (or more precisely, stock returns). Conversely, Khan et al. (2022) argue for a six-factor model, including human capital as the additional factor to the Fama and French five-factor model (or simply, FF5) . Similarly, with the advancement in technology, numerous other factors have been added to the asset pricing models, including management and investor sentiment and sustainability practices (or Environmental, Social and Governance (ESG) score) (Maiti, 2021; Sakariyahu et al., 2024).

Additionally, advances in technology not only complicate the inclusion of factors in asset pricing models but also improve data availability, thereby enhancing the efficiency of estimation approaches used to address the problem of asset pricing. Accurate asset pricing estimation is just as crucial for investors as identifying relevant premia in the market or assessing market efficiency and data quality. The choice of estimation approach significantly influences the insights derived from data analysis. Literature now shows an abundance of competing estimation approaches, each with strengths and weaknesses, making them context- and market-specific. Recently, the integration of Artificial Intelligence (AI) into data analytics has led to the development of advanced approaches, including numerous variants of deep learning (DL) (Chen et al., 2024).

As such, we highlight three issues in the evergreen asset pricing problem in the modern business environment. Firstly, the nature of the market and its institutional setting play a role in establishing its efficiency. Fan et al. (2011); Siddiqui, Khan, et al. (2024); and Siddiqui, Sohail, et al. (2024) argue that ‘underdeveloped’ markets (like frontier markets) are weak, less transparent, and inefficient. Secondly, extending conventional asset pricing models, such as the FF5 model, is important to improve their explanatory power (Khan et al., 2022, 2023; Thalassinos et al., 2023). Finally, selecting the estimation approach for predicting asset returns is also important, as different techniques suit different contexts and markets (Barua et al., 2024; Chen et al., 2024).

To address these gaps in literature, we compare the predictive power of three estimation techniques, namely (1) Ordinary Least Square (OLS) technique; (2) Autoregressive Integrated Moving Average considering Exogenous variables (ARIMAX) following the Maximum Likelihood Estimation (MLE); and (3) Long Short Term Memory (LSTM) Recurrent Neural Network (RNN), an application of Deep Learning (DL). We test their predictive power in a frontier (underdeveloped) market in Pakistan, with a specific focus on extending the FF5 model to include the sixth factor, human capital.

Our findings support the inclusion of human capital in FF5 in Pakistan, as firms with high human capital investment exhibit a positive return premium. Similarly, our findings also provide evidence of a negative premium for firms showing low investment in human capital. In terms of prediction power, ARIMAX predictions dominate other estimation techniques. Specifically, for LSTM-RNN, we highlight that the inherent complexities of DL approaches are only complemented by the availability of high-quality, large datasets. Underdeveloped markets like Pakistan lack the large, diverse datasets that limit the capabilities of highly sophisticated DL models like LSTM-RNNs. In such markets, estimation techniques such as ARIMAX dominate in terms of predictive power. This is because ARIMAX captures both time-series dynamics and the impact of exogenous variables (the six factors).

The rest of the paper is arranged as follows: Section 2 provides a detailed literature review, followed by Section 3, which explains the overall research methodology. Section 4 presents the study's findings, followed by Section 5, which provides the discussion. Finally, Section 6 concludes the study.

2. Literature Review

2.1. Theoretical Perspective

For decades, scholars have explored and examined the fundamental principles of asset pricing in financial markets. Conventional wisdom associates risk as the key indicator of estimating returns; for example, the Modern Portfolio Theory (MPT) of Markowitz (1952) argues that investors make investment decisions by maximizing returns and minimizing risk. The MPT later shaped Capital Asset Market Theory to value individual securities by Sharpe (1964). This theory develops the Capital Asset Pricing Model (CAPM). This theoretical foundation addresses the asset-pricing issue by assuming that information about risks and returns is available in the market, which investors can access, a phenomenon referred to as market efficiency (Fama, 1970).

Taking the single-factor model, CAPM, Fama and French (1992) developed a three-factor model (FF3), adding size and value factors to the market factor already identified in the CAPM. Numerous other factors have been identified in literature, following the three-factor model, as additions to the asset pricing models. For example, Carhart (1997) and Jegadeesh and Titman (1993) identified the momentum premium as an important factor in explaining asset prices, and later, Fama and French (2015) added two more factors to their model: profitability and investment (FF5). Similarly, Maiti (2021) added the premium from ESG as the sixth factor to the FF5. More recently, Khan et al. (2022, 2023) also added a sixth factor to the asset-pricing model: human capital. Conversely, Hendershott et al. (2020), not focusing on extending the factors in asset pricing models, examined whether the timing of the day affected asset prices. Their findings provide empirical evidence that asset prices respond differently to factor premia, for example, when markets are open (day) versus closed (night).

Referring specifically to human capital, the work of Khan et al. (2022, 2023); Qin (2002); Roy and Shijin (2018); and Yuan (2012) provide empirical support for inclusion of this factor in the asset pricing models. However, from a theoretical perspective, the Resource-Based Theory, credited to the work of Penrose (2009) justifies the inclusion of human capital as a premium for stock returns. This theory suggests that firms use their resources and capabilities to establish and maintain a competitive advantage. Employees of the firm can also be seen as a form of capital or a resource that is used for operational efficiency and competitive advantage, as highlighted by Kryscynski et al. (2021). Although Khan et al. (2022, 2023) empirically test this for the Pakistani market, these studies use traditional estimation approaches.

Unfortunately, literature on asset pricing fails to reach consensus on the factors that explain asset prices. Additionally, the enigma of asset pricing is further amplified with technological advancements; the potential to incorporate premia from factors such as textual insights and investor sentiment has become possible. The data-driven insights from these ‘new’ factor premia provide evidence that asset pricing is much more complex than conventionally perceived. For example, Fabozzi and Nazemi (2023); and Sakariyahu et al. (2024) examine various sentiment-based factor premia for inclusion in asset pricing models.

2.2. Machine Learning (ML) and Asset Pricing

Although at one end, technology complicates asset pricing, at the other end, it provides advanced tools and techniques to estimate asset pricing models with greater accuracy. For example, Gan et al. (2020); Giglio et al. (2022); and Khoa and Huynh (2021) identify the prospects of machine learning (ML) for the field of finance, highlighting that these data-driven, advanced analytics techniques can be applied to resolve asset pricing problems scholars have observed in the past. However, Brunnermeier et al. (2021) explain the application of big data analytics and ML comes with its own challenges that must be understood and accounted for in the modeling to ensure effective outcomes. Nonetheless, Drobetz and Otto (2021); and Gu et al. (2020) show empirically that asset-pricing predictions from ML techniques in developed markets have the potential to yield significant economic gains for investors. Additionally, Gu et al. (2020) identify the ability of ML models to allow for non-linear relationships as the fundamental reason for such predictive analytical outcomes to outperform other traditional, simpler estimation techniques.

More recently, Chen et al. (2024); and Khoa and Huynh (2022) apply DL techniques to predict asset prices using factor models for developed markets. DL is a subset of ML, an extension of traditional linear regression, in which numerous neural nodes (each representing a linear regression) work in a structured network with many layers between the standard input and output layers (Dong et al., 2021). Furthermore, Chen et al. (2024) show that DL-NN (deep learning neural networks) outperform other estimation methods when predicting US stock returns. These findings are corroborated by Khoa and Huynh (2022), who show that a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) outperforms in predicting stock returns for FF5.

Although literature overall shows consensus that there is no superior predictive power of DL-NN techniques, nonetheless, majority of the studies making such claims focus on developed markets (see, for example: Drobetz & Otto, 2021; Giglio et al., 2022; Gu et al., 2020; Liu & Xu, 2021). Additionally, as highlighted by Chen et al. (2024), numerous techniques within the broad umbrella of ML can be used for predictive analytics. Due to these identified challenges, the application of ML to asset pricing becomes unclear, especially in developing markets, which are often categorized as weaker and less efficient (Fan et al., 2011; Sidddiqui, Khan, et al., 2024).

2.3. Summary of the Gap

To summarize, even with advanced analytical techniques such as ML and DL-NN, literature is unable to solve the enigma of asset pricing. Furthermore, attempts made to apply advanced DL applications in developing markets (or low- or medium-income countries per the criteria set by the World Bank ) are relatively scarce in literature. The available literature is dominated by studies covering developed markets (for example, the US and Europe). Although the work of Khoa and Huynh (2022) focuses on Vietnam; however, such studies are pretty few and far between. Therefore, current literature shows a significant gap in the application of ML techniques to asset pricing in underdeveloped markets.

3. Data and Methods

3.1. Research Design, Data Collection, and Sample Selection

To address the objective of the study, we select all the non-financial firms listed on the Karachi Stock Exchange KSE-100 index, as it is a popular proxy used to represent the market of Pakistan (Siddiqui et al., 2023). To compute the return, adjusted closing daily prices are used from the start of July 2018 to the end of June 2023. Returns of the KSE-100 index are used to represent market returns, and annualized 3-month treasury bill rates are used to represent the risk-free rate.

Data for all variables is collected from Refinitiv DataStream. The final sample comprises 72 non-financial firms (21 were excluded from the sample as they were financial firms, and a further seven were excluded due to data inconsistency, as an approach to address potential survivorship bias, following the approach of Thalassinos et al. (2025)).

3.2. Empirical Methods Applied

3.2.1. Six-Factor Construction

To construct the factors for the portfolios, we follow the approach of Fama and French (2015), with one modification. Following the methodology proposed by Khan et al. (2022), we add a sixth factor to the popular FF5: human capital. To be more specific, these studies use an approach of Fama and MacBeth (1973), which is essentially a two-stage regression technique to compute risk premia for the factors. First, we estimate a cross-sectional regression for each time point. Then, we average out the coefficients across all time points to compute the risk premium for each of the six factors. To summarize, the six-factor model can be empirically presented as follows (Khan et al., 2022):

Stage 1 Regression:

(1) $$R_i=\alpha_0+\alpha_1{R(p)}_i+\alpha_2{\rm SMB}_i+\alpha_3{\rm HML}_i+\alpha_4{\rm RMW}_i+\alpha_5{\rm CMA}_i+\alpha_6{\rm HRI}_i+\varepsilon_i,$$

Stage 2 Regression:

(2) $$R_t=\beta_0+\beta_1{R(p)}_t+\beta_2{\rm SMB}_t+\beta_3{\rm HML}_t+\beta_4{\rm RMW}_t+\beta_5{\rm CMA}_t+\beta_6{\rm HRI}_t+\mu_t,$$

where ‘i’ represents the cross-sectional unit, ‘t’ is time, ‘’ and ‘’ are respective error terms, ‘Ri’ represents excess portfolio returns (calculated by subtracting the risk-free interest rate from the return of the stock), and R(p), SMB, HML, RMW, CMA, and HRI are the six factors as explained in Table 1.

Factor Symbol Definition and Measurement
Market R(p) Excess market return – referred to as market risk premium (or simply market premium), calculated as: average return of the market represented by KSE-100 Index, less risk-free return represented by the weighted average yield of the 3-month treasury bills.
Size SMB (Small minus big) Returns from the portfolio of small stocks LESS returns from the portfolio of big stocks – where size is measured through the market capitalization of the firm.
Value HML (high minus low) High book-to-market stock portfolio, less low book-to-market stock portfolio – where the book-to-market ratio is computed by dividing the book value of total assets of the firm by the market value of the total shares of the firm.
Profitability RMW (robust minus weak) The robust portfolio returns are lower than those of the weak portfolio, as measured by earnings before interest and tax (EBIT).
Investment CMA (conservative minus aggressive) The return of the conservative investment stocks portfolio is lower than that of the aggressive investment stocks portfolio, as measured by growth in total assets.
Human Capital HRI (human resource growth) Return of high human capital investment portfolio LESS return of low human capital investment portfolio – where investment in human capital is represented by the growth in the payroll expense charged in the statement of profit or loss.
Table 1. Definitions and Measurements of the Variables. Note. This table presents the definitions and measurements for each of the six factors used in the study. Source. Compiled by the authors, sourced from the work of Khan et al. (2022).

Further, we sort the portfolios using a 2x3 approach and construct a set of 24 portfolios (for portfolio construction, see Appendix 2A and 2B) to calculate the risk premia for the six factors (Fama & French, 2015; Khan et al., 2022, 2023).

Finally, there are some further limitations to using Fama and MacBeth (1973) two-step regression that need to be acknowledged. That is, this approach may be affected by cross-sectional dependence, small-sample bias in underdeveloped markets, and the assumption of constant factor loadings, which can be restrictive in highly volatile periods. Nonetheless, the method remains a robust baseline method for examining factor significance across portfolios (Khan et al., 2023; Ullah et al., 2025). To complement these results, this study additionally employs ARIMAX and LSTM-RNN models.

3.2.2. Forecasting Methods

We use a supervised ML technique to forecast the excess returns of stocks, following three approaches: (1) OLS; (2) ARIMAX that applies MLE; and (3) LSTM-RNN, an application of DL.

At this stage, it is important to clarify that the key purpose of this study is forecasting-oriented. More specifically, while the six-factor framework offers a theoretical foundation for variable inclusion, the objective is not to establish causal relationships (as literature already performs such examination, for example, Khan et al. (2023)), but to compare the predictive performance of the three estimation techniques highlighted under the contextual conditions of a frontier market.

Supervised ML techniques have recently become quite popular for performing predictive analytics; these techniques train models (e.g., OLS, ARIMAX, and LSTM) on a labelled dataset. Once a model is trained, we test its forecasting power by comparing forecasts with actual results. Literature uses different evaluation metrics to compare the predictive or forecasting power of different ML models. For example, Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). As highlighted by Hodson (2022), no measure is inherently superior to the others, and different evaluation metrics suit different error distributions. Therefore, we compare the forecasting power using the following four measures:

(3) $$RMSE=\sqrt{\frac{\sum_{i=1}^{N}{(R_i-{\hat{R}}_i)}^2}{N}},$$

(4) $$MSE=\frac{1}{N}\sum_{i=1}^{N}{(R_i-{\hat{R}}_i)}^2,$$

(5) $$MAE=\frac{1}{N}\sum_{i=1}^{N}\left|R_i-{\hat{R}}_i\right|,$$

(6) $$R^2=1-\frac{\sum_{i=1}^{N}{(R_i-{\hat{R}}_i)}^2}{\sum_{i=1}^{N}{(R_i-\bar{R})}^2},$$

where N represents the number of observations, R is the portfolio’s realized excess return, Ŕ is the portfolio’s forecasted return, and R̄ is the mean realized return.

(1) OLS Forecasts

In the first step, we perform the OLS regression to establish a linear relationship between the excess returns of the portfolios and explanatory variables (the six factors). We use the model shown in Equation 2 toestimate the coefficients for each factor. We train the OLS model on the 80% dataset, assuming a linear relationship between the variables. Using the trained coefficients, we then forecast the excess portfolio returns. The actual excess returns are then compared with the forecasted returns following Equations 3to 6 to compute the four highlighted measures for OLS.

(2) ARIMAX Forecasts

In the second step, we attempt to capture time-series dependencies when forecasting the portfolio excess returns. ARIMAX estimation considers both auto-regressive components (historical values of the portfolio returns) and external predictors (the six factors) (Ifeanyichukwu Ugoh et al., 2021). Equation 2 is modified to account for these changes with backshift operators as follows:

(7) $$\delta(\gamma) R_t = \gamma X_t + \delta(\gamma) + \epsilon_t ,$$

where Xₜ is the matrix of exogenous variables (the six factors), δ(γ)Rₜ captures the autoregressive component, δ(γ) represents the moving-average component, and εₜ is the error term.

For this modeling, we need to define the autoregressive order (p), the differencing order (d), and the moving-average order (q), denoted as (p,d,q). To determine the best q, d, and q values for the study, we automate the model selection approach using the Akaike Information Criterion (AIC) while running ARIMAX estimations via a grid search using the ‘loop’ command. The training model uses a stationary version of the time series and the best (p, d, q) for each portfolio to select optimal parameters for the ARIMA components (details on these optimal parameters for each portfolio are available in Appendix 1). The forecasting in this modeling considers both the time-series components of returns and the influence of the six external factors. Finally, following Equation 3 to 6, we compute the four comparison measures for the ARIMAX predictions.

(3) LSTM Forecasts

Finally, to capture both nonlinear relationships and long-term dependencies among the variables, we apply an LSTM-RNN. RNN is a DL technique that allows the information to persist across time. LSTM is a type of RNN that is capable of retaining information over long sequences; as such, it is considered an effective technique for sequential predictions (Nakagawa et al., 2019).

For training our model, we use multiple layers: one LSTM layer with 20 units (neurons), a 20% dropout rate to avoid overfitting, and one fully connected dense layer with one output unit to provide a forecasted value for the next time stamp.

Additionally, the study does not perform formal hyperparameter optimization (e.g., grid search, random search, or Bayesian optimization). This decision is deliberate and consistent with the study’s comparative-forecasting objective and the data limitations of a frontier market context. Formal optimization requires extensive computational resources and large, high-frequency datasets to ensure stable convergence. However, such procedures risk overfitting and yield unstable results in small or noisy datasets like those in Pakistan. Instead, we apply a manually selected LSTM configuration guided by prior literature (Khoa & Huynh, 2021; Nakagawa et al., 2019).

The model is configured with 20 neurons, a 20% dropout rate, a 30-day lookback window, and a learning rate of 0.001 using the Adam optimizer. This structure balances simplicity, interpretability, and reproducibility, enabling a fair comparison between LSTM-RNN and traditional econometric models (OLS and ARIMAX).

Furthermore, adapting the approach of Khoa and Huynh (2022), we train our model using the Rolling Window forecasting approach, which ensures a continuously shifting window of recent data, allowing the model to capture current information. This improves the model's adaptability. The lookback window is set to 30 timestamps (since we use daily data, this means the model looks back 30 days to forecast the next value). The LSTM architecture is visually displayed under:

Figure 1. LSTM Architecture of the Study. Note. This figure visually displays the LSMT architecture of the study. The input layer shows 30 time stamps with a lookback window and six features per time stamp. The LSTM layer has 20 neurons with a total of 2,080 parameters (4 x (20 x (20 + 6) + 20). The dropout layer has a dropout rate of 20%. The dense layer contains a single neuron for single-step prediction, with a total of 21 parameters (20 x 1 + 1).

Our initial model is trained at 20 epochs with a batch size of 8, and the Adam optimizer is set at a learning rate of 0.001. For Rolling-Window-Fine-Tuning, we retrain our model on a single data point at a time, with one epoch and a batch size of 1.

Finally, to validate the stability of model performance, robustness checks are conducted using out-of-sample (OOS) forecasting tests for three representative portfolios with the highest mean return. These tests assess whether the comparative performance of models holds out of sample. In OOS tests, we divide the data into three periods (training period, validation period, and out-of-sample period). The OOS evaluation employs three complementary metrics (RMSE, MAE, and R-squared) to ensure a comprehensive comparison of predictive accuracy (Ullah et al., 2025). Additionally, to statistically evaluate forecast superiority between models, the Diebold-Mariano (1995) (DM) test is applied. This test compares forecast errors across competing models to determine whether the difference in predictive accuracy is statistically significant. This test is performed pairwise between OLS, ARIMAX, and LSTM-RNN models for the selected portfolios using one-step-ahead forecast errors over the test period. This multistep evaluation structure is used to confirm the robustness and consistency of the findings across different temporal partitions and model types.

We summarize the overall research approach in the following figure:

Figure 2. Summary of Methodology. Note. This figure visually summarizes the overall methodology used in the study to test the forecasting accuracy of three estimation techniques for the six-factor asset pricing model in Pakistan, using a supervised ML approach.

4. Results and Discussion

4.1. Descriptive Statistics

Table 2 presents a summary of the descriptive statistics for the study. The table first presents the selected descriptive measures for the 24 portfolios, followed by those for the six factors. Ranking the top 5 portfolios by mean return, we have P-7 (BN), P-14 (SC), P-15 (SH), P-12 (BW), and P-16 (SHhr), with P-19 (SN) showing the lowest mean return. Additionally, the returns for all the portfolios exhibit negative mean returns, as the sample period includes the COVID-19 recession. In terms of dispersion, portfolios P-24 (SW), P-19 (SN), and P-12 (BW) show the top three highest standard deviations in their returns. The lowest dispersion (measured by standard deviation) is shown by P-11 (BR). This shows that P-24, P-19, and P-12 are the most risky portfolios compared to the others in the Pakistani market.

Among the six factors, HML shows the highest mean for Pakistani non-financial firms in the sample, and SMB shows the lowest. Considering specifically the sixth factor added to the FF5, human capital (represented by HRI) shows the second-lowest mean value among the factors included in the study. In terms of dispersion, market return shows the highest value for standard deviation, and SMB shows the lowest. Again, focusing on the sixth factor added to the FF5, HRI shows the third-highest dispersion in the factors.

Finally, the results of the stationarity test (the Augmented Dickey–Fuller (ADF) Test) are also shown in Table 2 (last column). All portfolios and factors are stationary at the level, except for the market return. As all the other factors are stationary at the level, we proceed with the estimations.

Portfolio/Factor Mean Variance Standard deviation Minimum Maximum ADF
P-1 -0.03136 0.00028 0.017 -0.132 0.047 -3.517 ***
P-2 -0.03068 0.00028 0.017 -0.159 0.019 -3.050 ***
P-3 -0.03092 0.00035 0.019 -0.153 0.040 -4.004 ***
P-4 -0.03124 0.00027 0.016 -0.113 0.022 -3.432 ***
P-5 -0.03086 0.00023 0.015 -0.169 0.014 -3.212 ***
P-6 -0.03104 0.00030 0.017 -0.180 0.049 -3.162 ***
P-7 -0.03090 0.00027 0.016 -0.106 0.020 -3.425 ***
P-8 -0.03067 0.00026 0.016 -0.138 0.013 -3.407 ***
P-9 -0.03128 0.00028 0.017 -0.129 0.023 -3.673 ***
P-10 -0.03099 0.00027 0.016 -0.147 0.013 -3.140 ***
P-11 -0.03125 0.00021 0.014 -0.107 0.015 -4.413 ***
P-12 -0.03075 0.00038 0.020 -0.211 0.059 -3.787 ***
P-13 -0.03128 0.00033 0.018 -0.117 0.031 -7.019 ***
P-14 -0.03044 0.00032 0.018 -0.115 0.023 -5.954 ***
P-15 -0.03079 0.00029 0.017 -0.098 0.019 -5.577 ***
P-16 -0.03073 0.00032 0.018 -0.138 0.020 -5.148 ***
P-17 -0.03095 0.00029 0.017 -0.124 0.028 -3.479 ***
P-18 -0.03079 0.00032 0.018 -0.127 0.036 -4.954 ***
P-19 -0.03076 0.00038 0.020 -0.138 0.028 -7.719 ***
P-20 -0.03083 0.00034 0.018 -0.110 0.040 -8.249 ***
P-21 -0.03088 0.00029 0.017 -0.100 0.030 -4.222 ***
P-22 -0.03085 0.00028 0.017 -0.111 0.024 -5.023 ***
P-23 -0.03080 0.00028 0.017 -0.121 0.027 -5.222 ***
P-24 -0.03084 0.00043 0.021 -0.115 0.043 -4.970 ***
R(p) -0.03027 0.00023 0.015 -0.126 0.015 -2.140
SMB -0.00017 0.00004 0.006 -0.040 0.037 -33.130 ***
HML 0.00036 0.00008 0.009 -0.053 0.069 -13.107 ***
RMW 0.00014 0.00011 0.010 -0.086 0.093 -24.110 ***
CMA 0.00013 0.00007 0.008 -0.098 0.048 -13.671 ***
HRI 0.00012 0.00008 0.009 -0.037 0.111 -35.952 ***
Table 2. Summary of the Descriptive Statistics. Note. This table shows the summary of the descriptive statistics of the variables of the study and the results of their stationarity test. The table is divided into two parts; the first part shows the descriptive statistics of the 24 portfolios (for a complete description, see Appendix 2A and 2B), and the second part shows the descriptive statistics of the six factors. For the stationarity test (ADF), (***) shows significance at 1%.

4.2. Correlation Matrix

Table 3 shows the correlation among the six factors. None of the variables show a correlation exceeding ±80%, providing evidence of the absence of multicollinearity (Siddiqui, Khan, et al., 2024). The strongest positive correlation is observed between SBM and HML, followed by market return and HML. Similarly, the strongest negative correlation is observed between HML and RMW. Regarding the human capital factor added in FF5, HRI shows a significant correlation with only two other factors. It shows a weak negative correlation with HML (value) and a weak positive correlation with CMA (investment). Another interesting finding from the correlation matrix is that all five factors identified by Fama and French (2015) show significant correlations with one another, whether positive or negative. However, the sixth factor, as per the notions of Khan et al. (2022), shows significant correlations with only two other factors: HML (value) and CMA (investment). Although the findings of the correlation matrix are not conclusive, they do give us an idea of a stronger integration and association between the five factors of Fama and French (2015). Additionally, considering the market factor, arguably the most historical factor, as it was associated with asset pricing in the single factor model by Sharpe (1964) in CAPM, shows a positive association with SMB (size) and HML (value) factors, and a negative association with RMW (profitability) and CMA (investment).

Factors R(p) SMB HML RMW CMA HRI
R(p) 1
SMB 0.1406 1
HML 0.3046 0.3219 1
RMW -0.3034 -0.2924 -0.5395 1
CMA -0.0874 -0.0694 -0.2044 0.0698 1
HRI -0.1087 0.1099 1
Table 3. Correlation Matrix of the Factors. Note. This table shows the correlation between the factors. All correlations in the table are significant at the 5% level or lower. The strength of the correlations is color-coded, with red indicating a positive correlation and blue a negative one (the darker the color, the stronger the correlation).

4.3. Graphical Analysis of Factors and Returns

To complement the descriptive statistics and ensure data integrity prior to modeling, the constructed factor series and portfolio excess returns are visually represented. Such graphical analysis allows detection of potential anomalies or sudden jumps that may introduce bias in estimations. Figure 3(a) presents time-series plots for the six constructed factors (R(p), SMB, HML, RMW, CMA, and HRI), and Figure 3(b) presents the portfolio returns. All series show smooth fluctuations overall. However, heightened volatility across all factors is observed between March 2020 and May 2020, which we link to the COVID-19 pandemic. This period of volatility is also evident in other emerging and frontier markets, as highlighted by Khan et al. (2023). The returns and factors show stability post the COVID-19 period, but factors and portfolio excess returns again show high volatility after March 2023. This period coincides with Pakistan’s severe macroeconomic stress following the mid-2023 International Monetary Fund (IMF) standby arrangement, as identified by Mufti (2024).

Figure 3. Time-Series Plots for Portfolio Excess Returns and Factors. Note. Figure (a) shows the plot for the six factors constructed in the study, and Figure (b) shows the plot for excess portfolio returns.

To summarize, the factor and return series exhibit two distinct volatility clusters: first during the COVID-19 shock and second during the macroeconomic instability of early 2023, confirming the sensitivity of factors and returns to significant market disruptions.

4.4. Forecasting Results

After performing the descriptive analysis and computing the correlation between the variables, we proceed to estimate the six-factor asset pricing model. We use OLS, ARIMAX, and LSTM-RNN to estimate and predict portfolio returns and compare their predictive powers. As explained earlier, to ensure a comprehensive evaluation of the model performance, four statistical measures are employed (RMSE, MSE, MAE, and R-squared). Figure 4 shows a box plot comparing the predictive powers of the estimation techniques. The results across all four measures consistently indicate the superiority of the ARIMAX model, with the lowest error values and the highest explanatory power. Studies such as Jakubowski et al. (2023) corroborate these findings, where authors identify the inclusion of exogenous factors as the primary reason for the superior performance of ARIMAX models. The OLS model performs closely to ARIMAX, reflecting its effectiveness in capturing linear relationships. In contrast, the LSTM-RNN model records comparatively higher error metrics and lower R-squared values, suggesting relatively limited predictability in Pakistan’s context.

Figure 4. Box Plots for Comparative Measures. Note. This figure compares the forecasting performance of the three estimation techniques. Plot (a) shows the comparison using RMSE, Plot (b) shows the comparison using MSE, Plot (c) shows the comparison using MAE, and Plot (d) shows the comparison using R-squared.

Further, Tables 4 to 7 test whether the prediction power of the three estimation techniques (OLS, ARIMAX, and LSTM-RNN) is significantly different. We use one-way ANOVA on the error measures and R-squared values across the three completing techniques. The results show that the p-values are significant at the 1% level, indicating that the mean forecasting errors differ significantly across the models. These results statistically reinforce the earlier findings that ARIMAX outperforms both OLS and LSTM-RNN in forecasting or predictive accuracy.

Groups Count Sum Average Variance
OLS 24 0.247933 0.010331 0.0000044
ARIMAX 24 0.2369613 0.009873 0.0000044
LSTM 24 0.333461 0.013894 0.0000027
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 0.000232606 2 0.000116 30.51018 0.0000 3.129644
Within Groups 0.000263023 69 3.81E-06
Total 0.000495629 71
Table 4. ANOVA Summary for RMSE. Note. This table shows the results of the single-factor ANOVA performed on the RMSE terms across all portfolios. The results show that the F-statistic is significant at the 1% level.
Groups Count Sum Average Variance
OLS 24 0.002661 0.000111 0.0000000019
ARIMAX 24 0.002448 0.000102 0.0000000014
LSTM 24 0.004695 0.000196 0.0000000022
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1.28E-07 2 6.41E-08 34.60513 0.0000 3.129644
Within Groups 1.28E-07 69 1.85E-09
Total 2.56E-07 71
Table 5. ANOVA Summary for MSE. Note. This table shows the results of the single-factor ANOVA performed on the MSE terms across all portfolios. The results show that the F-statistic is significant at the 1% level.
Groups Count Sum Average Variance
OLS 24 0.156606 0.006525 0.000000468
ARIMAX 24 0.152044 0.006335 0.000000687
LSTM 24 0.243885 0.010162 0.000001473
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 0.000223 2 0.000112 127.385 0.00000 3.129644
Within Groups 6.05E-05 69 8.76E-07
Total 0.000284 71
Table 6. ANOVA Summary for MAE. Note. This table shows the results of the single-factor ANOVA performed on the MAE terms across all portfolios. The results show that the F-statistic is significant at the 1% level.
Groups Count Sum Average Variance
OLS 24 11.4162 0.475675 0.037835
ARIMAX 24 12.2351 0.509796 0.033899
LSTM 24 8.52554 0.355231 0.002803
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 0.316491 2 0.158245 6.369133 0.002895 3.129644
Within Groups 1.714351 69 0.024846
Total 2.030842 71
Table 7. ANOVA Summary for R-Squared. Note. This table shows the results of the single-factor ANOVA on the R-squared terms across all portfolios. The results show that the F-statistic is significant at the 1% level.

In Table 8, we present the coefficient values for the 24 portfolios and their respective significance levels for the ARMIAX estimation, which dominates the other two estimation techniques for these portfolios. An important aspect to note from Table 8 is that the market (R(p)) and size (SMB) factors remain significant across all 24 portfolios, indicating their strong relevance in asset pricing. This is followed first by profitability (RMW), which turns insignificant for just one portfolio, and then by investment (CMA), which turns insignificant for three portfolios. Human capital (HRI) becomes insignificant for five portfolios, and the least-ranked value factor (HML) becomes insignificant for eight portfolios. Overall, these findings are consistent with the earlier findings of Khan et al. (2022, 2023) and support incorporating human capital factor (premium) in the asset-pricing models.

Portfolio R(p) SMB HML RMW CMA HRI R2
P-1 0.75554 -0.24391 -0.07844 -0.27629 -0.33996 -0.09944 0.72078
(p-value) 0.00000 0.00000 0.00840 0.00000 0.00000 0.00000
P-2 0.74189 -0.23438 -0.11063 -0.26719 0.6881 -0.15778 0.50732
(p-value) 0.00000 0.00000 0.00007 0.00000 0.00000 0.00000
P-3 0.99989 -0.19043 0.24532 -0.33926 0.26473 -0.05754 0.75335
(p-value) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00350
P-4 0.81602 -0.17022 -0.04077 -0.24347 0.13912 0.34705 0.37086
(p-value) 0.00000 0.00000 0.15458 0.00000 0.00000 0.00000
P-5 0.64332 -0.23563 -0.58187 -0.13267 0.0836 0.02439 0.53585
(p-value) 0.00000 0.00000 0.00000 0.00000 0.00244 0.35605
P-6 0.76702 -0.23219 0.02071 -0.2479 0.16513 -0.50839 0.67111
(p-value) 0.00000 0.00000 0.54284 0.00000 0.00000 0.00000
P-7 0.81487 -0.14597 0.04132 -0.26823 0.05115 -0.06038 0.61848
(p-value) 0.00000 0.00012 0.19961 0.00000 0.09009 0.01634
P-8 0.77522 -0.15584 -0.14747 -0.31709 0.09335 -0.00501 0.39055
(p-value) 0.00000 0.00010 0.00001 0.00000 0.00293 0.86415
P-9 0.82201 -0.12903 -0.01258 -0.2518 -0.00624 0.12098 0.22389
(p-value) 0.00000 0.00091 0.67928 0.00000 0.80622 0.00000
P-10 0.81944 -0.23073 -0.03 -0.16886 0.18183 -0.13285 0.40223
(p-value) 0.00000 0.00000 0.37676 0.00000 0.00000 0.00000
P-11 0.72848 -0.18469 -0.11372 0.14482 0.05863 0.03699 0.54171
(p-value) 0.00000 0.00000 0.00020 0.00000 0.05614 0.17651
P-12 0.81178 -0.18139 -0.00165 -0.75269 0.11651 -0.00505 0.75514
(p-value) 0.00000 0.00000 0.95727 0.00000 0.00001 0.84280
P-13 0.79752 0.86908 -0.06863 -0.25290 -0.36028 -0.05879 0.67922
(p-value) 0.00000 0.00000 0.02144 0.00000 0.00000 0.00056
P-14 0.79636 0.79226 -0.03487 -0.27645 0.56671 0.00809 0.42328
(p-value) 0.00000 0.00000 0.25399 0.00000 0.00000 0.64657
P-15 0.59826 0.75153 0.53584 -0.05290 0.09808 -0.08406 0.25930
(p-value) 0.00000 0.00000 0.00000 0.02569 0.00011 0.00035
P-16 0.72764 0.79987 -0.06086 -0.22974 0.14512 0.55423 0.20295
(p-value) 0.00000 0.00000 0.07395 0.00000 0.00000 0.00000
P-17 0.85823 0.82121 -0.58513 -0.27258 0.27981 -0.16507 0.39932
(p-value) 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
P-18 0.75665 0.87398 -0.08195 -0.22195 0.14393 -0.60558 0.67852
(p-value) 0.00000 0.00000 0.00459 0.00000 0.00000 0.00000
P-19 1.00462 0.80416 -0.15317 -0.42180 0.00589 0.11321 0.59840
(p-value) 0.00000 0.00000 0.00007 0.00000 0.84808 0.00010
P-20 0.86682 0.77694 -0.08604 -0.32882 0.10994 -0.05855 0.33326
(p-value) 0.00000 0.00000 0.01231 0.00000 0.00070 0.02870
P-21 0.73083 0.75997 -0.05235 -0.24265 0.11674 -0.06103 0.54273
(p-value) 0.00000 0.00000 0.12255 0.00000 0.00088 0.04537
P-22 0.64822 0.69222 0.06659 -0.20989 0.20238 -0.27206 0.27308
(p-value) 0.00000 0.00000 0.03367 0.00000 0.00000 0.00000
P-23 0.99769 0.85288 -0.13346 0.27297 0.11733 0.07363 0.85513
(p-value) 0.00000 0.00000 0.00004 0.00000 0.00000 0.00302
P-24 0.81397 0.85858 -0.18846 -0.86452 0.05666 0.10172 0.49865
(p-value) 0.00000 0.00000 0.00000 0.00000 0.03240 0.00002
Average 0.79551 0.30493 -0.06885 -0.25924 0.12409 -0.03964
Max 1.00462 0.87398 0.53584 0.27297 0.6881 0.55423
Min 0.59826 -0.24391 -0.58513 -0.86452 -0.36028 -0.60558
Range 0.40636 1.11789 1.12097 1.13749 1.04838 1.15981
Table 8. Coefficient Values for the Six-Factors from ARMIAX Estimation. Note. This table shows the coefficients and their respective p-values for the six factors across all 24 portfolios.

4.5. Robustness Check

To ensure the stability and reliability of the baseline findings, additional robustness checks are performed using OOS forecasting across three representative portfolios (P-7 (BN), P-14 (SC), and P-15 (SH)), which demonstrate the highest mean returns. The evaluation employs three performance measures (RMSE, MAE, and R-squared) to compare the predictive accuracy of the three estimation techniques (OLS, ARIMAX, and LSTM-RNN). The comparative results are illustrated in Figure 5, which shows that across all three metrics and all selected portfolios, the ARIMAX model consistently performs better (achieving the lowest values for RMSE and MAE and the highest values for R-squared). In contrast, the LSTM-RNN model remains the weakest performer across all measures. Additionally, to statistically assess differences in forecast accuracy, pairwise DM tests are conducted between the three models. These results are summarized in Table 9, which reveal that for P-7, the difference between ARIMAX and OLS is weakly significant at the 10% level; for P-14, the difference is statistically insignificant. For P-15, the difference is highly significant (at the 1% level). In contrast, the DM tests comparing ARIMAX with LSTM and OLS with LSTM are all statistically significant, indicating that the LSTM-RNN model has lower predictive capability for the Pakistani market. Overall, the robustness results corroborate the baseline findings: ARIMAX remains the most effective forecasting approach, OLS performance remains comparably close, and LSTM-RNN exhibits the weakest predictive power in Pakistan.

Figure 5. Out-of-Sample Test Results for the Three Representative Portfolios. Note. This figure compares the forecasting performance of the three estimation techniques using OOS tests. Plot (a) shows the comparison using RMSE, Plot (b) shows the comparison using MAE, and Plot (c) shows the comparison using R-squared.

Portfolio P-7 P-14 P-15
OLS vs. ARIMAX DM stat -1.658 1.022 -4.771
(p-value) (0.097) (0.307) (0.000)
ARIMAX vs. LSTM-RNN DM stat -8.716 10.296 -10.68
(p-value) (0.000) (0.000) (0.000)
LSTM-RNN vs. OLS DM stat -8.459 -10.181 -9.895
(p-value) (0.000) (0.000) (0.000)
Table 9. Diebold–Mariano (DM) Pairwise Test Results. Note. This table shows the results of the pairwise DM tests comparing the forecasting errors of the three estimation approaches across the three representative portfolios, where p-values are stated in parentheses.

5. Discussion

Table 8 provides fascinating insights into the Pakistani stock market. The most important point to highlight is the contribution of CAPM. The market factor identified by Sharpe (1964), is still the most potent factor in explaining asset prices, even in the modern economic environment. The market premium shows the highest average and maximum values among the 24 portfolios for the Pakistani market, when compared with the other five factors assessed in the current study. Additionally, the coefficient sign of the market factor remains positive for all 24 portfolios, consistent with the Capital Asset Pricing Theory of Sharpe (1964). Numerous recent studies, like Khan et al. (2022, 2023) and Kumar (2024), support these findings. Considering this from the perspective of a frontier market, the Pakistani stock market is relatively less diversified than those of developed and emerging markets, as a result, investors in this market focus more on a select few factors. As most shocks follow the market’s overall performance, the market premium is the most potent predictor of portfolio returns. Additionally, as reported by Din et al. (2022), a low level of foreign institutional ownership in the Pakistani market further amplifies the impact of market movements. Therefore, we identify these two reasons as key contributors to the market factor, which shows the highest average portfolio premium in the Pakistani market.

Focusing on human capital (HRI), the sixth factor added to the FF5 in the current study, this factor also yields interesting insights. The results show that this factor, on average, helps explain the asset prices in Pakistan, justifying its inclusion in the multi-factor model, consistent with the findings of Khan et al. (2022, 2023). However, an exciting insight overlooked in these earlier studies is that human capital shows a positive and significant association with portfolios P-4 and P-16. More specifically, these portfolios are both large and small (SMB) and high in human capital investment. Meaning that the Pakistani market rewards firms for higher investments in human capital. Additionally, the premium for P-16 is higher than for P-4 (0.55423 and 0.34705, respectively), indicating that small firms are paid a higher premium for greater investment in human capital than big firms. Finally, referring to P-6 and P-18, which represent both large and small portfolios in terms of size and low human capital investment, both portfolios show a negative human capital premium. This suggests that, for firms with low investment in human capital, whether big or small, the market penalizes them by lowering their returns. Interestingly, the penalty for low human capital investment is more severe in small firms than in big firms (the premia for HRI for P-18 and P-6 are -0.60558 and -0.50839, respectively). These findings support the Resource-Based Theory, suggesting that even in frontier markets (which are less transparent and efficient than developed or emerging markets (Siddiqui, Khan, et al., 2024)), human capital investments are reflected in asset prices. Specifically considering this from the perspective of the Pakistani market, where technological and infrastructure developments lag behind developed nations, the market sees human capital as a critical driver for innovation and success, as highlighted by Mubarik et al. (2020).

Finally, focusing on the superior predictive power of ARIMAX estimations over OLS and LSTM-RNN, we identify the ability of ARIMA-based models to incorporate temporal dependencies efficiently in the modeling process as their key feature, setting them apart. ARIMA-based techniques integrate autoregressive and moving-average components (or shocks) into their estimation. In addition to all this, as reported by Ifeanyichukwu Ugoh et al. (2021) and Jakubowski et al. (2023), ARIMAX also accounts for exogenous factors in the modeling process (specifically, six factors), further strengthening the predictive power of this approach. In contrast, OLS assumes independent observations, which is usually unrealistic, as noted by Burton (2021). However, the LSTM-RNN approach, although it captures temporal dependencies in data, often struggles with suboptimal performance on low or weakly labeled datasets, as highlighted by Barua et al. (2024). More specifically, the authors highlight that the complexity inherent in LSTM modeling may not be an advantage for all stock types or markets. We highlight this as the key reason for the model's underperformance when compared with OLS and ARIMAX. The Pakistani market is arguably much simpler than developed markets, where high-frequency data is not abundantly and transparently available to market participants. Therefore, LSTM-RNN estimations for such markets are less effective than in markets that do not face these issues, as reported by Chen et al. (2024) and Nakagawa et al. (2019). Similarly, another important limitation identified in the current study is the inclusion of exogenous variables in the estimation process, which may restrict the model’s flexibility. An unsupervised approach to selecting relevant factors (or variables) for predicting portfolio returns may yield better predictions from these models.

Nonetheless, the comparison of the three estimation approaches (OLS, ARIMAX, and LSTM-RNN) shows a preference for ARIMAX for predicting portfolio returns in a frontier market like Pakistan, which struggles with a relatively less transparent, weak, and underdeveloped market (Fan et al., 2011; Siddiqui, Khan, et al., 2024). In contrast, OLS relies on the restrictive assumption of independent observations, and LSTM-RNN (despite its strengths in capturing long-term dependence) tends to underperform when data are limited, as is often the case in frontier markets (Nakagawa et al., 2019; Siddiqui, Sohail, et al., 2024). Together, these results deepen the understanding of how market complexity and structure jointly determine predictive effectiveness in frontier markets.

6. Conclusion

This study tests whether the inclusion of a sixth factor (human capital) in the conventional FF5 is empirically supported in Pakistan's frontier market. Additionally, we compare the predictive power of three estimation approaches (OLS, ARIMAX (following MLE), and LSTM-RNN (a deep learning approach)) to assess their usefulness in less developed markets that lack access to large, transparent datasets. We show that the predictive power of ARIMAX is superior to that of the other two estimation techniques, at least for the frontier market of Pakistan. The complexity of LSTM models is identified as the reason for inferior LSTM-RNN predictions compared with ARIMAX, rendering them less suitable for frontier markets characterized by low transparency and limited data availability. Therefore, we advise considering nature and context when applying deep learning models to markets, as they are sensitive to hyperparameter tuning and the dataset size.

The current study has numerous contributions. Firstly, from a theoretical perspective, the current study integrates Resource-Based Theory with the Efficient Market Hypothesis, suggesting that asset prices reflect this critical information, even in frontier markets. Additionally, we extend the available literature on the human capital-based six-factor asset-pricing model from an ML perspective. From a practical perspective, our findings show a preference for ARIMAX-based predictions for economic decision-making. The study justifies the use of more straightforward yet robust techniques to help investors make informed investment decisions in a frontier market like Pakistan.

Given the limitations of the current study, we focus only on the Pakistani stock market. The applicability of different estimation techniques across broader frontier markets needs to be tested to improve the study's generalizability. Secondly, the performance of DL models may improve with either unsupervised learning or larger datasets. As such, a thorough comparison of developed, emerging, and frontier markets may yield more profound insights into advanced hyperparameter optimization techniques to improve the usability of DL methods in underdeveloped markets. Additionally, our study prepares portfolios using the approach of Fama and French (1992), however, a sector-specific portfolio analysis may uncover deeper industry-level dynamics in a market, focusing specifically on the impact of the human capital premium on industry-level portfolio returns. Finally, as discussed earlier, the Fama and MacBeth (1973) two-step regression assumes stable factor loadings and independence across portfolios, which may not fully hold in volatile markets, however, complementary dynamic models used in current mitigate these concerns.

Supplementary Materials:Additional materials are available.

Funding: No funding was received for the research.

Data Availability Statement: The data that support the findings of this study are available from Refinitiv. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the author(s) with the permission of Refinitiv.

Conflicts of Interest: The authors declare no conflict of interest.

Author Contributions: Conceptualization, O.S. and N.K.; methodology, O.S., N.K., and A.B.; software, O.S.; validation, N.K. and A.B.; formal analysis, O.S.; data curation, N.K. and O.S.; writing—original draft preparation, O.S.; writing—review and editing, N.K.; visualization, O.S.; supervision, A.B.; project administration, A.B. All authors have read and agreed to the published version of the manuscript.

AI Use Statement:The authors used ChatGPT (OpenAI) for grammar and language refinement. All content was carefully reviewed and verified by the authors. Additionally, the authors used AI-based coding assistants to debug scripts, but all model specifications, analyses, and interpretations were designed and verified by the authors.

Appendices

Portfolio Autoregressive Order (p) Differencing order (d) Moving Averages Order (q)
P-1 3 1 2
P-2 0 1 2
P-3 2 0 4
P-4 2 1 3
P-5 0 1 1
P-6 1 1 3
P-7 3 1 2
P-8 0 1 1
P-9 1 1 2
P-10 0 1 2
P-11 1 1 2
P-12 1 1 3
P-13 1 1 1
P-14 1 1 4
P-15 2 1 3
P-16 2 1 4
P-17 0 1 4
P-18 0 1 1
P-19 3 0 4
P-20 1 1 4
P-21 2 1 3
P-22 1 1 2
P-23 0 0 1
P-24 1 1 2
Table 10. Appendix 1 - Order details for the ARIMAX Model. Note. This table shows the (p, d, q) orders for ARMIAX estimation across all 24 portfolios. We use a grid search to determine optimal (p, d, q) ordering levels based on the Akaike Information Criterion (AIC).
Sort Break points Factor constructions
2x3 sorts on size and book-to-market ratio Size and operating profitabilitySize and investment Size and human capital premium Size: KSE-100 index MedianB/M: 30th & 70th percentiles OP: 30th & 70th percentiles Inv: 30th & 70th percentilesHCPrem: 30th & 70th percentiles SMBB/M= ((SH+SN+SL)/3) – ((BH+BN+BL)/3)SMBOp= ((SR+SN+SW)/3) – ((BR+BN+BW)/3)SMBInv= ((SC+SN+SA)/3) – ((BC+BN+BA)/3)SMBHCprem=((SLHCprem+SNHCprem+SHHCprem)/3)– ((BLHCprem+BNHCprem+BHHCprem)/3)SMB=((SMBB/M+SMBOP+SMBINV+SMBHR)/4)HML=((SH+BH)/2) -((SL+BL)/2)RMW=((SR+BR)/2) -((SW+BW)/2)CMA=((SC+BC)/2) -((SA+BA)/2)HCPrem=((SHHCpremr+BHHCprem)/2)-((SLHCprem+BLHCprem)/2)
Table 11. Appendix 2 A - Factor Construction. Note. This table shows the construction of portfolios sorted by size, book-to-market ratio, operating profitability, investment, and human capital. Following the construction methods of Fama and French (2015), we independently sort the stocks into two size groups and into three book-to-market profitability, investment, and human capital groups. The selected portfolios are labeled with two letters. The letters for size groups are S (small) and B (big), where the book-to-market-ratio group is labeled H (High), N (neutral), and L (Low). Similarly, for operating profitability groups, R (Robust), N (Neutral), and W (Weak). For the investment group, the labels are C (conservative), N (neutral), and A (Aggressive). For the human capital group, the labels are LHCprem (low labor income growth), NHCprem (neutral labor income growth), and HHCprem (high labor income growth). Using these methods, we construct a set of 24 portfolios and five risk factors. The aforementioned factors are; SMB(Small-Minus-Big), HML (High-minus-low book-to-market ratio), RMW(Robust-minus-weak), CMA (conservative-minus-aggressive) and HCpremium (high-minus-low labor income growth rate). Source.Fama and French (2015) and Khan et al. (2022, 2023).
2x3 Factors
Portfolios Label Freq. % Cum. %
SL P-1 744 5.01 5.01
SN P-2 494 3.33 8.34
SH P-3 992 6.68 15.02
BL P-4 744 5.01 20.04
BN P-5 991 6.68 26.71
BH P-6 744 5.01 31.72
SW P-7 991 6.68 38.4
SNop P-8 245 1.65 40.05
SR P-9 494 3.33 43.38
BW P-10 1237 8.33 51.71
BNop P-11 246 1.66 53.37
BR P-12 495 3.33 56.7
SA P-13 493 3.32 60.02
SNinv P-14 739 4.98 65
SC P-15 497 3.35 68.35
BA P-16 739 4.98 73.33
BNinv P-17 249 1.68 75.01
BC P-18 497 3.35 78.35
SLHCprem P-19 988 6.66 85.01
SNHCprem P-20 746 5.03 90.04
SHHCprem P-21 493 3.32 93.36
BLHCprem P-22 246 1.66 95.01
BNHCprem P-23 245 1.65 96.67
BHHCprem P-24 495 3.33 100
Table 12. Appendix 2 B - Portfolio Details. Note. The subscript HCprem denotes the Human Capital Premium; op denotes operating profit; and inv denotes investment. S represents small size firms, B is big size firms, L shows firms with low book to market, N is for firms with neutral book to market, H is for firms with high book to market, W is firms with weak profitability, R denotes firms with robust profitability, A is used for firms with aggressive investment, C is for firms with conservative investments, LHCprem are firms with low Human Capital premium, NHCprem represents firms with neutral Human Capital premium, and HHCprem is used for firms with high Human Capital premium. The portfolios SL, SN, SH, BL, BN, and BH are formed based on size and book-to-market ratios. SL denotes small low; it is a portfolio comprising small-sized firms with low book-to-market ratios. SN (small neutral) is a portfolio of firms with small size and neutral book-to-market. SH (small high) is a portfolio with a small size and a high book-to-market. A BL (big low) portfolio comprises large firms with low book-to-market ratios. BN (big neutral) is a portfolio of large-cap firms with neutral book-to-market ratios. BH (big high) is a portfolio of big firms with high book-to-market ratios. The portfolios SW, SNop, SR, BW, BNop, and BR are formed based on size and operating profitability. SW (small weak) is a portfolio of small, weakly profitable firms. SNop (small neutral operating profit) is a portfolio of firms with small size and neutral operating profit. SR (small robust) is a portfolio comprising small-sized firms with robust profitability. BW (big weak) is a portfolio comprising large firms with weak profitability. BNop (big neutral operating profit) is a portfolio of large firms with neutral profitability. BR (big robust) is a portfolio comprising large, robust, profitable firms. SA, SNinv, SC, BA, BNinv, and BC are portfolios developed based on size and investments. SA (small aggressive) is a portfolio comprising small-sized firms with aggressive investments. SNinv (small neutral investment) is a portfolio of firms with small size and neutral investments. SC (small conservative) is a portfolio comprising small-sized firms with conservative investments. BA (big aggressive) is a portfolio of large firms with aggressive investments. BNinv (significant neutral investments) is a portfolio of large, neutral investments. BC (prominent conservative) is a portfolio of large-cap firms with conservative investments. SLHCprem, SNHCprem, SHHCprem, BLHCprem, BNHCprem, and BHHCprem are portfolios formed using size and Human Capital premium. SLHCprem (small low) is a portfolio comprising small-sized firms with a low Human Capital premium. SNHCprem (small neutral) is a portfolio of firms with small size and a neutral Human Capital premium. SHHCprem (small high) is a portfolio comprising a small-sized firm with a high Human Capital premium. BLHCprem (big low) is a portfolio comprising large-sized firms with a low Human Capital premium. BNHCprem (big neutral) is a portfolio of large-sized firms with a neutral Human Capital premium. BHHCprem (big high) is a portfolio comprising large-sized firms with a high Human Capital premium. Source. Fama and French (2015) and Khan et al. (2022, 2023).

Disclaimer: All statements, viewpoints, and data featured in the publications are exclusively those of the individual author(s) and contributor(s), not of MFI and/or its editor(s). MFI and/or the editor(s) absolve themselves of any liability for harm to individuals or property that might arise from any concepts, methods, instructions, or products mentioned in the content.