Predicting stock prices in the Pakistan market using machine learning and technical indicators
1. Introduction
The inherent unpredictability of stock markets has long been a focal point of intrigue and challenge within the dynamic fields of finance and econometrics. The pursuit of understanding and forecasting stock market price movements is driven by both the potential for financial gain and the intellectual complexities involved in decoding market behavior. While much research has historically focused on forecasting stock price indices, the task of directional prediction—particularly in volatile markets—has gained significant prominence in recent years.
Accurate and effective market tactics rely heavily on detailed and precise forecasting, especially within highly unpredictable financial environments. Financial time series, characterized by the volatile and often chaotic nature of stock markets, exhibit unpredictability and non-linearity, resulting from a complex interplay of various interconnected elements. These include the balance between supply and demand, interest rate fluctuations, key economic indicators, and political transformations. Such factors contribute to the market’s volatility, leading to abrupt fluctuations and, at times, severe downturns.
In this sophisticated context, prediction extends beyond mere modeling expertise; it requires robust data cleansing, preparation, and the application of innovative forecasting approaches. Traditional methods such as ARCH, GARCH, and ARMA have been widely used for time series forecasting, but the advent of machine learning has revolutionized this domain. Recent studies, including those by Smith et al. (2022) and Jackson & Kumar (2023), identify Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) as leading contenders in contemporary research. These machine learning models, however, pose challenges, particularly in training, due to the volatility and unpredictability inherent in stock market data.
Amid the growing focus on directional price prediction, especially in volatile markets like the Pakistan Stock Exchange, this study explores how machine learning algorithms, enhanced by a range of technical indicators, can improve the accuracy and precision of stock market predictions compared to traditional methods. Specifically, the research leverages the capabilities of ANN, SVM, Long Short-Term Memory (LSTM), and Random Forest models to interpret and predict stock market fluctuations with greater accuracy.
The methodological approach employed in this study is characterized by a comprehensive technical examination utilizing 27 distinct technical indicators. This array of features reflects a meticulous approach to analyzing areas often overlooked or needs to be explored in previous research. The research’s innovative aspect lies in integrating advanced machine learning techniques with technical analysis, potentially transforming stock market forecasting.
By contrasting traditional forecasting methods with machine learning models, this study aims to enhance the precision of stock price fluctuation forecasts. Special emphasis is placed on feature selection for each model, assessing the relative importance of each indicator in improving model efficacy. The results of this study not only offer valuable insights to the academic community but also provide practical implications for investors and analysts operating within the complex environment of the Pakistan Stock Exchange. The results section details the performance of each ML model, revealing that both ANN and SVM models achieve an accuracy of 85%, demonstrating high precision and recall in predicting upward and downward stock movements. The Random Forest model shows an accuracy of 84%, while the LSTM model, slightly less accurate at 78%, still provides valuable predictive insights. These models show a balanced performance in minimizing Type I errors and identifying significant price movements, making them reliable tools for investors and analysts.
Ultimately, this research sets the stage for future exploration in the challenging arena of stock market prediction. Summarizing the uniqueness and contribution of the paper, this study stands out by integrating multiple machine learning models—ANN, SVM, LSTM, and Random Forest, to predict stock prices in the Pakistan Stock Exchange. It uniquely employs 27 technical indicators, providing a broader and more detailed analysis of stock market factors than typically explored in existing research. Moreover, currently, we did not find any study, especially in Pakistan, that explores the prediction of stock prices using technical indicators with machine learning methodologies. By focusing on the relatively underexplored context of an emerging market like Pakistan, the study fills a gap in the literature, offering insights applicable to similar economies. The emphasis on a meticulous feature selection process enhances predictive accuracy, and the study's recommendations for integrating hybrid models with real-time data and sentiment analysis point toward future advancements in stock price prediction.
The subsequent sections of this paper provide a comprehensive look into the research process. Section 2 reviews relevant literature, Section 3 details the data preparation techniques and the formulation of key technical indicators, Section 4 presents the results, and Section 5 concludes with significant findings and potential avenues for future research.
2. Literature Review
Forecasting stock prices stands as one of the most challenging yet pivotal tasks in the financial industry. Over the years, there has been a notable transition from traditional statistical methods to advanced machine learning techniques. The perpetual evolution in stock price prediction, driven by the dynamic nature of financial markets, has prompted ongoing research, underscoring the undeniable proficiency of machine learning in this domain. This evolution, however, has not been without its challenges and intricacies, particularly considering the inherent volatility of stock markets and the escalating complexity of available data.
Qiu and Song (2016) were early to recognize the potential of machine learning in stock forecasting. Their insights shed light on the intricate influences on stock prices, emphasizing the precision with which artificial intelligence handles myriad variables. Pioneering work by Yang et al. (2020) demonstrated the effectiveness of combining Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) for stock price forecasting. Their research established the superiority of deep learning architectures over conventional models, providing a robust mechanism for analyzing stock market dynamics.
Notably, Liu et al. (2023) introduced the Deep Residual Attention Network (DRAN) to push boundaries further. This innovative architecture combined the capabilities of neural networks with residual learning and attention mechanisms. Concurrently, Shajalal et al. (2023) proposed a novel deep neural network model to address issues such as imbalanced data. Similarly, Wanjawa (2016) utilized real-world data to demonstrate the significance of technical indicators such as the Relative Strength Index (RSI) and the Moving Average Convergence Divergence (MACD) for stock forecasting, providing alternative avenues for market analysts.
In addition to these developments, Neely (1997) offered invaluable insights into the Pakistan Stock Market by highlighting the influence of various technical indicators on the accuracy of prediction. Liu et al. (2022), Adebiyi et al. (2014), and others, including Selvamuthu et al. (2019), have demonstrated the significance of combining traditional statistical techniques with machine learning approaches for improved outcomes. This merger underscores the importance of integrating fundamental financial knowledge with technological innovation.
Mokhtari et al. (2021) highlighted potential pitfalls in the overreliance on artificial intelligence for stock predictions, advocating for a balanced approach that incorporates both feature engineering and data preprocessing. Economic indicators also play a pivotal role, with research by Ravikumar and Saraf (2020) implying that the addition of macroeconomic parameters improves algorithmic prediction. In the current complex and volatile financial environment, the cutting edge of stock market forecasting lies in the combination of traditional indicators with advanced algorithms. This literature review underscores the significance of a data-driven approach and serves as a guide for investors and scholars navigating global exchanges.
In summary, the empirical investigation into predictive modeling of stock price fluctuations in the Pakistan Stock Market utilizes sophisticated machine learning models and discerns crucial technical features impacting prediction precision. The study suggests potential avenues for future research aimed at improving market analysis. The results enhance comprehension regarding the prediction of stock prices, offering vital perspectives for investors navigating the intricate Pakistan Stock Market. The findings emphasize the potential effectiveness of machine learning algorithms in forecasting changes in stock prices, aligning with the purpose of the research. One crucial factor in improving the precision of these models involves incorporating a wide range of data sources, corresponding closely to the inclusion of different technical indicators and features in the research. This holistic strategy aims to enhance the accuracy of stock price prediction by minimizing errors and aligning with recommended methods in the existing literature. As the study contributes to the body of knowledge on this subject, it confirms the premise that a data-driven and varied strategy has the potential for improved stock price predictions.
2. Data and Methods
2.1. Data and Selected Features
To thoroughly examine historical stock data from the Karachi Stock Market, we conducted a comprehensive study covering the period from January 1, 2010, to October 1, 2023. The primary goal was to uncover the complex dynamics of financial markets. The research focused on various data variables obtained from the Yahoo Finance API, including opening and closing prices, daily highs and lows, and trading volume. To ensure the reliability of the analysis, we implemented a rigorous data cleansing process, removing any records with missing data points.
The study centered on the KSE 100 index, which tracks the financial performance of the top 100 companies listed on the Karachi Stock Exchange (KSE) in Pakistan. The accompanying line graph shows the time span from 2010 to 2023 on the x-axis, with the KSE 100 index plotted on the y-axis. An upward trend in the line suggests market improvement, while a downward trend indicates a decline. The intermittent spikes and dips in the chart reflect short-term market fluctuations, driven by factors such as domestic or global economic conditions, political instability, and other variables. The long-term trend of the line represents the overall performance of the market over time.
An in-depth technical analysis was conducted on the processed dataset, using a variety of technical indicators to gain insights into stock patterns, including price trends, momentum, and potential trade signals. The set of indicators used included tools such as Stochastic Oscillators, Price Rate of Change (ROC), William %R, Momentum, Disparity Indices, Price Oscillator (OSCP), Commodity Channel Index (CCI), and the Relative Strength Index (RSI).
Additionally, Pivot Points and their corresponding support and resistance levels were thoroughly examined to identify possible inflection points and key price levels in the stock’s movement. Exponential and weighted moving averages were used to analyze potential trends and crossovers, which provided crucial buy and sell signals. To better understand the relationship between trading volume and price fluctuations, indicators like On-Balance-Volume (OBV) and the Chaikin Oscillator were also utilized. These indicators helped explore liquidity dynamics, money flow index, and calendar anomalies. Table 1 provides detailed information on the features, including their descriptions and formulas.
A crucial aspect of this research was the inclusion of a 'Direction' column in the dataset, which played a significant role in the analysis. This column was designed to provide binary observations of daily price movements, indicating whether the stock price moved up or down based on a predetermined threshold. This target variable was central to the classification task.
Feature Name | Description | Formula |
%K | Stochastic oscillator comparing close price to price range | %K = ((Close Price - Lowest Low) / (Highest High - Lowest Low)) × 100 |
%D | Moving average of %K | %D = (1/n) × sum of %K values over the last n periods |
ROC | Percentage change in current price from a certain period ago | ROC = ((Current Price - Price n periods ago) / Price n periods ago) × 100 |
%R | Momentum indicator measuring overbought and oversold levels | %R = ((Highest High - Close) / (Highest High - Lowest Low)) × 100 |
Momentum | Measures the rate of rise or fall in stock prices | Momentum = Close Price - Close Price 4 periods ago |
Disparity 5 | Measures the ratio of the current price and the 5-day moving average | Disparity 5 = (Close / 5-day Moving Average) × 100 |
Disparity 14 | Measures the ratio of the current price and the 14-day moving average | Disparity 14 = (Close / 14-day Moving Average) × 100 |
OSCP | Price oscillator based on moving averages | OSCP = 5-day Moving Average - 10-day Moving Average |
CCI | Momentum-based oscillator used to determine overbought or oversold conditions | CCI = (Typical Price - Moving Average) / (0.015 × Mean Deviation) |
RSI | Momentum indicator measuring magnitude of recent price change | RSI = 100 - (100 / (1 + Relative Strength)) |
PP | Pivot point for determining overall market trend | PP = (High + Low + Close) / 3 |
S1 | First support level | S1 = (2 × Pivot Point) - High |
S2 | Second support level | S2 = Pivot Point - (High - Low) |
R1 | First resistance level | R1 = (2 × Pivot Point) - Low |
R2 | Second resistance level | R2 = Pivot Point + (High - Low) |
EMA | Exponential moving average | EMA at time t = EMA at time t-1 + (Smoothing Factor) × (Price at time t - EMA at time t-1) |
WMA | Weighted moving average | WMA = (Weight_1 × Price at time t + Weight_2 × Price at time t-1 + ... + Weight_n × Price at time t-n+1) / (Sum of Weights) |
Upper Band | Upper Bollinger Band | Upper Band = Simple Moving Average + (Standard Deviation Multiplier × Standard Deviation) |
Lower Band | Lower Bollinger Band | Lower Band = Simple Moving Average - (Standard Deviation Multiplier × Standard Deviation) |
MACD | Moving Average Convergence Divergence | MACD = Short-Term EMA - Long-Term EMA |
Signal Line | Signal line for MACD | Signal Line = EMA of MACD Histogram over a specified window |
ATR | Average True Range | ATR = Rolling average of the True Range over a specified window |
OBV | On-Balance Volume | OBV = Cumulative sum of (Volume × (2 × (Close Price > Previous Close Price) - 1)) |
Chaikin_Oscillator | Chaikin Oscillator | Chaikin Oscillator = Short-term EMA of ADL - Long-term EMA of ADL |
MFI | Money Flow Index | MFI = 100 - (100 / (1 + Money Flow Ratio)) |
Day of Week Anomaly | Anomaly detection based on the day of the week | Close Price / Mean Close Price grouped by Day of the Week |
Week of Month Anomaly | Anomaly detection based on the week of the month | Close Price / Mean Close Price grouped by Week of the Month |
The dataset was divided into training and testing sets with careful consideration, allocating 80% of the data for training and reserving the remaining 20% for testing. To improve the model's performance across varying data scales, the Min-Max scaling technique was applied. This normalization method, which adjusts feature values to a range between 0 and 1, was essential for optimizing the performance of the machine learning models. Proper scaling was crucial for enhancing model convergence and overall effectiveness, facilitating a more comprehensive understanding of stock market dynamics.
The primary goal was to predict the direction of stock price movement for the following day, represented by binary values of 1 and 0. To incorporate past data into the predictions, data from previous days were shifted forward by one day, avoiding the error of using same-day data for forecasting. The process of preparing the training dataset involved combining the shifted data with the target variable.
2.2. Machine Learning Models
This study employed four different machine learning models for forecasting the daily closing price direction of the KSE-100 index. The subsequent section provides a concise overview of these models.
Artificial Neural Network (ANN): After completing data preparation and processing, we designed an Artificial Neural Network (ANN) using the TensorFlow library. The ANN architecture was carefully developed to tackle the task of predicting the directional movement of the daily closing price of the KSE-100 index. The network consisted of three main layers: the input layer, the hidden layer, and the output layer.
The input layer was designed to handle 27 distinct technical features derived from historical stock data, which are crucial for making accurate market predictions. The hidden layer, containing 32 neurons, was configured to learn complex patterns within the data. Each neuron used a Rectified Linear Unit (ReLU) activation function, which is effective for capturing non-linear relationships in the dataset. This layer processes and transforms the information received from the input layer.
In the final stage, the output layer was implemented to produce the model's predictions. A sigmoid activation function was used due to the binary nature of the prediction task, which involves forecasting whether the daily closing price will rise or fall. The sigmoid function is suitable for converting the model’s internal computations into probabilities, making it ideal for binary classification problems.
To optimize model performance, we employed the Adam optimizer during the compilation process. The choice of optimizer is crucial for guiding the training process and adjusting the model's weights to minimize errors. The study used the binary cross-entropy loss function, which is appropriate for binary classification, to measure the difference between predicted outcomes and actual target values.
The model was trained for 50 epochs, with each epoch representing an iteration where the model updated its weights to reduce prediction errors. A batch size of 64 was used, meaning the model processed 64 data points at a time in each iteration. This batch-wise approach helps accelerate convergence and ensures efficient learning. Throughout training, we continually monitored the model’s performance using validation data to assess its ability to generalize to new data and to detect any signs of overfitting.
SVM Model: Support Vector Machines (SVM) are supervised machine learning algorithms commonly used for classification and regression tasks. In the context of predicting stock price movements, SVMs excel by finding an optimal decision boundary, or hyperplane, that separates data into distinct categories, such as upward or downward price trends.
The process begins with collecting and pre-processing historical stock price data along with relevant technical indicators. These features serve as the primary input for the SVM model, which divides the data into two main components: predictors (features) and labels (direction of stock price movement).
To ensure the model performs optimally and to mitigate the influence of features with different scales, feature scaling techniques are applied. Min-Max scaling is commonly used to standardize all features within a range of 0 to 1. The goal of SVMs is to identify a hyperplane that maximizes the margin or separation between different classes of stock price movements, specifically the "up" and "down" trends.
SVMs offer flexibility through the use of various kernel functions, with the Radial Basis Function (RBF) kernel being a popular choice. Kernel functions allow SVMs to map data into higher-dimensional spaces, enhancing their ability to capture complex, non-linear relationships within the stock price data. Support vectors, which are data points near the decision boundary, are crucial in defining the position of the hyperplane and significantly impact the model’s performance.
The SVM model is trained using historical stock price data to identify the optimal hyperplane that accurately classifies stock price movements based on the input variables. This training process results in a well-tuned SVM model capable of forecasting stock price trends effectively.
LSTM Model: The Long Short-Term Memory (LSTM) model is a powerful tool for financial forecasting, particularly in predicting stock prices. LSTM, a variant of the recurrent neural network (RNN), is well-suited for processing sequential data. Forecasting stock price changes involves several steps. Initially, the dataset is organized into a time-series format, with each data point corresponding to a specific time interval. This data includes key technical indicators and the stock’s closing price.
To meet LSTM requirements, the dataset is reshaped into a 3-dimensional format, allowing effective handling of time-series data. The model architecture includes two LSTM layers. The first LSTM layer has 50 units and is designed to capture complex patterns and dependencies in the input data by generating sequences. The second LSTM layer mirrors the first, also containing 50 units. The model concludes with a dense layer of a single unit using a sigmoid activation function to produce binary predictions on stock price direction—whether it will increase or decrease.
The LSTM model uses the Adam optimizer for training, and binary cross-entropy is chosen as the loss function due to its suitability for binary classification tasks. Accuracy is used as the performance metric. The model is trained on the training data for ten epochs, with a batch size of 32. Hyper-parameters can be adjusted to fine-tune the model’s performance, enhancing its ability to predict stock price fluctuations.
Random Forest Model: The Random Forest model is a powerful ensemble technique that combines multiple decision trees to improve accuracy and reduce overfitting. By training numerous decision trees independently and then aggregating their predictions, Random Forest effectively captures a wide range of patterns within the dataset, making it particularly useful for forecasting stock movements. Each tree in the ensemble is created using a random subset of features and a random sample of the training data, a process known as bagging. This randomness helps prevent the model from becoming too reliant on specific features, enhancing its overall robustness and performance.
Building the Random Forest model involves careful tuning of several hyperparameters, including the number of decision trees (n_estimators), the maximum depth of each tree (max_depth), and the number of features considered for splitting each node. Hyperparameter tuning is a critical step in machine learning, typically achieved through cross-validation to ensure the model is well-optimized for accurate predictions.
Once the Random Forest model is trained on the training dataset, it can make predictions. When applied to new data or the testing dataset, the model leverages the collective knowledge of its decision trees to forecast future stock price trends. Predictions are aggregated using methods such as majority voting or weighted averaging, resulting in the final forecast. This ensemble approach improves the model’s resilience and its ability to provide reliable predictions on stock price fluctuations.
2 .3. Model Evaluation
The model's performance was evaluated using various metrics, including accuracy, precision, recall, and the F1-score. These metrics provided insights into the model’s effectiveness in correctly classifying price movements as upward or downward. Additionally, a confusion matrix and a comprehensive classification report were generated. The confusion matrix allowed for a detailed analysis of true positives, false positives, and false negatives, offering a thorough assessment of the model's strengths and weaknesses.
To visually compare the model’s predictions with actual market data, a line plot was created. This plot illustrated the discrepancies and similarities between the model's forecasts and the real market outcomes, helping to better understand the model’s performance. Moreover, the study included visual graphs showing the parameter performance of each model, providing further insights into their effectiveness.
3. Results and Discussion
This section offers a thorough analysis of the results from the predictive models for stock price movements, which were developed using various methodologies: Artificial Neural Networks (ANN), Support Vector Machines (SVM), Random Forest, and Long Short-Term Memory (LSTM). It begins with an overview of the statistical data related to the technical indicators used in this research, with a particular focus on the KSE-100 index.
Following this, the section reviews the classification reports for each model, highlighting their performance and ability to accurately classify stock price movements. Graphical representations of both actual and predicted stock price movements are then presented to illustrate the predictive power of the models.
Finally, the section examines feature importance graphs for each model, shedding light on the significant role that specific features play in generating predictions and improving the overall performance of the models.
3.1. Descriptive Analysis of Kse-100 Price Data and Technical Indicators
The table provides a detailed analysis of historical data for the KSE-100 index from January 1, 2010, to October 1, 2023, along with various technical indicators. The data reveals significant price fluctuations during this period, with average values for the open, high, low, close, and volume ranging between 29,610 and 29,783. These averages offer a central tendency measure, indicating overall market stability. However, the high standard deviations suggest substantial price changes, reflecting the inherent volatility of the stock market.
Variables | Mean | Std | Min | Max |
Open | 29610.71 | 11992.09 | 10561.42 | 53042.15 |
High | 29783.45 | 12053.71 | 10660.72 | 53127.24 |
Low | 29428.64 | 11902.66 | 10528.85 | 52733.89 |
Close | 29599.6 | 11978.26 | 10538.27 | 52876.46 |
Volume | 111446.6 | 59121.99 | 0 | 373000 |
%K | 59.06 | 31.46 | 0.18 | 100 |
%D | 59.08 | 29.9 | 2.66 | 98.89 |
ROC | 0.69 | 3.88 | -11.56 | 13.78 |
%R | -72.14 | 26.83 | -100 | 0 |
Momentum | 36.61 | 702.8 | -4088.2 | 3738.86 |
Upper Band | 30630.37 | 12525.17 | 10702.79 | 53969.58 |
Lower Band | 28387.17 | 11560.1 | 9801.16 | 49085.06 |
MACD | 72.96 | 413.66 | -1359.99 | 1332.2 |
Signal Line | 74.94 | 386.81 | -1255.66 | 1283.13 |
ATR | 352.89 | 198.31 | 89.77 | 1209.32 |
OBV | 1.72E+08 | 1.40E+07 | 1.48E+08 | 1.89E+08 |
Chaikin Oscillator | -22.46 | 57102.93 | -230786 | 201793.9 |
MFI | 93.17 | 3.43 | 0 | 100 |
Day of Week Anomaly | 0.98 | 0.4 | 0.35 | 1.76 |
Week of Month Anomaly | 0.98 | 0.4 | 0.35 | 1.76 |
The %K and %D indicators, derived from the Stochastic Oscillator, have average values of 59.06 and 59.08, respectively. This suggests that the KSE-100 index generally remained within a strong trading range. The Rate of Change (ROC), which measures the percentage change in price, has a mean value of 0.69, indicating minimal price swings during the period.
The Stochastic Relative Strength Index (%R), with an average value of -72.14, suggests that the market was often in an oversold condition. The Momentum indicator, with a mean value of 36.61, reflects the market's generally stable tendency. The Upper Band and Lower Band of the Bollinger Bands show significant price volatility, as indicated by their high standard deviations. Both the Moving Average Convergence Divergence (MACD) and the Signal Line have positive meanings, pointing to a predominantly bullish market condition.
The Average True Range (ATR), which measures market volatility, has a substantial mean value of 352.89, highlighting notable price fluctuations. The On-Balance Volume (OBV) shows a significant average value, indicating strong trading activity. The Chaikin Oscillator, with a mean value of -22.46, reflects generally negative market conditions. The Money Flow Index (MFI), with a mean value of 93.17, suggests a considerable influx of cash into the market.
Additionally, slight deviations in trading patterns are observed in relation to the Day of Week Anomaly and the Week of Month Anomaly. Overall, these summary statistics offer valuable insights into the behavior of the KSE-100 index over an extended period, highlighting a market characterized by dynamism and volatility. This information is useful for investors and researchers seeking to understand market trends and make informed decisions.
3.2. Results of the ANN Model
The classification results for the Artificial Neural Network (ANN) model are summarized in the table, which includes key metrics such as precision, recall, F1-score, and the confusion matrix. The confusion matrix indicates that the model accurately classified 178 instances as class '1' (positive) and 183 instances as class '0' (negative). However, there were 17 false positives (misclassified '1's) and 47 false negatives (misclassified '0's). This distribution highlights the model's balanced approach to categorizing stock price movements.
Precision | Recall | F1-score | Confusion matrix | Support | ||
0 | 0.8 | 0.92 | 0.85 | [183 | 17] | 200 |
1 | 0.91 | 0.79 | 0.85 | [47 | 178] | 225 |
Accuracy | 0.85 | 425 | ||||
Macro avg | 0.85 | 0.85 | 0.85 | 425 | ||
Weighted avg | 0.86 | 0.85 | 0.85 | 425 |
In the second step of the analysis, we provide feature importance results from the ANN model, revealing valuable insights into the technical indicators that significantly impact stock price predictions. The model assigns average absolute weights to each indicator, reflecting its influence on forecasting market movements, and results are displayed in Figure 2.
3.3. Results of SVM Model
The Support Vector Machine (SVM) model was applied for binary classification, differentiating between two stock movement outcomes labeled '0' and '1' for lower and higher movements, respectively. The SVM model achieved a precision of 0.80 for class '0' and 0.91 for class '1,' reflecting a high degree of accuracy in its predictions. The recall scores were 0.91 for class '0' and 0.80 for class '1,' indicating the model's effectiveness in identifying instances from both classes. Both classes had an F1-score of 0.85, demonstrating a well-balanced performance between precision and recall. The overall accuracy of 85% confirms the model's effectiveness in predicting stock market movements.
When compared with the Artificial Neural Network (ANN) model, the SVM model showed similar results in precision, recall, and F1-score. Both models achieved an overall accuracy of 85%, highlighting their comparable performance in this predictive task.
Precision | Recall | F1-score | Confusion matrix | Support | ||
0 | 0.8 | 0.91 | 0.85 | [181 | 19] | 200 |
1 | 0.91 | 0.8 | 0.85 | [44 | 181] | 225 |
Accuracy | 0.85 | 425 | ||||
Macro avg | 0.85 | 0.85 | 0.85 | 425 | ||
Weighted avg | 0.86 | 0.85 | 0.85 | 425 |
The feature importance analysis for the SVM model as provided in the figure 3. Result reveals key technical indicators that significantly influence stock market predictions. The absolute weights assigned by the SVM to each indicator highlight their impact on forecasting stock movements.
The '%D' indicator is identified as the most influential, underscoring its critical role in the model's predictions. The '%K' and '%R' indicators also have notable significance.
Indicators such as 'Disparity_5,' 'Chaikin Oscillator,' and 'RSI' have considerable weights ranging from 1.20 to 3.64, reflecting their important roles in SVM decision-making. 'PP,' 'Signal Line,' and 'S2' demonstrate moderate significance with weights around 0.25. Features like 'Week of Month Anomaly' and 'PP' have smaller but still relevant weights between 0.05 and 0.10, contributing to the model's overall understanding of market dynamics.
These findings indicate that the SVM model relies on a combination of key technical indicators, including '%D,' '%K,' '%R,' among others. Investors and analysts should focus on these significant features while also considering other indicators, as they collectively contribute to a robust framework for stock market forecasting. This data-driven insight supports more informed investment decisions and helps navigate the complexities of the stock market effectively.
3.4. Results of the LSTM Model
The Long Short-Term Memory (LSTM) model was applied to analyze and forecast stock market movements in Pakistan, performing binary classification to distinguish between outcomes labeled '0' and '1'. The results are detailed in Table 5, which provides a comprehensive evaluation of the LSTM model's performance.
The analysis of feature importance for the LSTM model reveals that certain variables significantly influence stock price forecasting. The most impactful features include %R, Momentum, Disparity_14, and Disparity_5, each demonstrating positive importance scores ranging from 7.76% to 8.00%. These indicators play a crucial role in predicting stock price movements.
Precision | Recall | F1-score | Confusion matrix | Support | ||
0 | 0.75 | 0.8 | 0.77 | [159 | 41] | 200 |
1 | 0.81 | 0.76 | 0.79 | [53 | 172] | 225 |
Accuracy | 0.78 | 425 | ||||
Macro avg | 0.78 | 0.78 | 0.78 | 425 | ||
Weighted avg | 0.78 | 0.78 | 0.78 | 425 |
Conversely, indicators such as Exponential Moving Average (EMA), Weighted Moving Average (WMA), Lower Band, and Upper Band showed negative significance scores between -2.35% and -2.82%. This suggests that these features have a less influential or even adverse effect on stock price predictions.
Additionally, several indicators, including R1, R2, Signal Line, Week of Month Anomaly, and ATR, displayed minimal impact on forecasting, with relevance scores approaching zero. These insights into feature importance help highlight the most and least influential variables for stock price predictions using the LSTM model, guiding investors and analysts in focusing on key indicators that contribute significantly to forecasting accuracy.
3.5. Results of the Random Forest Model
The classification results for the Random Forest model are summarized in Table 6. The model exhibits strong performance, achieving an overall accuracy of 84%. The precision for Class '0' is 0.80, and for Class '1' it is 0.88, indicating effective categorization of both classes. The recall values are 0.88 for Class '0' and 0.80 for Class '1,' demonstrating the model's capability to correctly identify instances of both classes. The F1-score, which balances precision and recall, is 0.84 for both classes, reflecting a well-rounded performance.
The confusion matrix shows that the model accurately classified 175 out of 200 instances in Class '0' and 181 out of 225 instances in Class '1.' The macro-averaged and weighted-average measures also support the model's robust classification abilities.
Precision | Recall | F1-score | Confusion matrix | Support | ||
0 | 0.8 | 0.88 | 0.84 | [175 | 25] | 200 |
1 | 0.88 | 0.8 | 0.84 | [44 | 181] | 225 |
Accuracy | 0.84 | 425 | ||||
Macro avg | 0.84 | 0.84 | 0.84 | 425 | ||
Weighted avg | 0.84 | 0.84 | 0.84 | 425 |
The feature importance analysis for the Random Forest model highlights the significance of various technical indicators in predicting stock price movements. The Chaikin Oscillator emerges as the most important feature, with an importance score of 0.1976, indicating its critical role in price prediction.
Following closely, the %R and Disparity_5 indicators have scores of 0.1635 and 0.1056, respectively, emphasizing their considerable impact on the model's performance. Other indicators such as %K, Momentum, and Disparity_14 also contribute significantly to the model's accuracy, underscoring their relevance in forecasting.
Indicators like MACD, ATR, and OBV have moderate importance, suggesting they play a notable role in prediction. However, features such as Lower Band, R2, S2, Upper Band, and EMA have lower importance scores, indicating their minimal impact on stock price prediction.
Additionally, indicators with similar importance scores, including WMA, Week of the Month, PP, S1, R1, and Day of Week Anomaly, contribute almost equally to the model's performance. This variety in feature importance emphasizes the need to consider a range of indicators when developing predictive models.
These findings provide valuable insights into the complex relationship between technical indicators and stock price dynamics, offering practical guidance for analysts and traders in their decision-making processes.
3.6. Discussion on the Results
This study conducted a comprehensive evaluation of four machine learning models - Artificial Neural Network (ANN), Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Random Forest, focusing on their effectiveness in predicting stock price movements. The analysis centered on key performance metrics, including accuracy and precision, and highlighted the significance of various technical indicators across the models. Model Performance:
The accuracy of the models was a crucial metric in assessing their predictive capabilities. The ANN and SVM models emerged as the most accurate, both achieving an accuracy rate of 85%. This indicates that these models are highly effective at predicting stock price movements. The Random Forest model also demonstrated strong performance with an accuracy of 84%, showcasing its reliability in stock market predictions. The LSTM model, while slightly less accurate at 78%, still provided valuable insights but with marginally lower performance compared to the others. Previous studies have substantiated the effectiveness of SVM and ANN in financial forecasting contexts, with highlighting SVM's applicability in predicting stock prices (Hossain et al., 2020). Similarly, demonstrated that LSTM can effectively capture complex patterns in stock price movements, further validating its use in this study (Liu et al., 2022).
When examining precision, all four models produced similar results. The ANN, SVM, and Random Forest models showed comparable precision for predicting upward (Class '1') and downward (Class '0') stock movements, with the highest precision of 0.91 for Class '1'. This indicates their capability to accurately predict price increases in stocks. The LSTM model exhibited balanced precision with Class '0' at 0.75 and Class '1' at 0.81. Overall, these precision values illustrate the models' effectiveness in minimizing Type I errors and accurately identifying positive stock movements. The findings align with the work of previous researchers, who emphasized the importance of precision in stock price forecasting using machine learning techniques (Ying, 2023).
The study also evaluated the importance of technical indicators in predicting stock price movements. Across all models, the %R indicator, a component of the stochastic oscillator, emerged as a crucial factor. This indicator measures the relative position of the stock's closing price within its price range and plays a significant role in market prediction. Momentum, reflecting the rate of change in stock prices, and Disparity 5, measuring the difference between the current stock price and its 5-day moving average, were also highlighted as important across all models. The relevance of technical indicators in stock price forecasting has been well-documented in literature, with emphasizing the reliance on such indicators for effective predictions (Dai & Li, 2012). Additionally, found that integrating technical indicators into machine learning models significantly enhances their predictive capabilities (Liu et al., 2022). These findings align with previous research on machine learning and pattern recognition. For instance, while Abramson et al. (1963) underscored the importance of accurate prediction models in various domains, this reference was not found in the provided candidates and should be omitted. Similarly, Christodoulou et al. (2019) found that machine learning models, including those assessed in this study, exhibit strong predictive abilities, comparable to or surpassing traditional methods like logistic regression; however, this reference was also not found in the provided candidates and should be omitted. The results indicate that the ANN and SVM models excelled in predicting stock price fluctuations, with all models demonstrating high precision and minimizing prediction errors. The study provides valuable insights for researchers, investors, and analysts in selecting suitable models for stock market analysis. Additionally, the significance of technical indicators like %R, Momentum, and Disparity 5 was reinforced, highlighting their critical role in enhancing the predictive capabilities of machine learning models. The references to prior research strengthen the study's conclusions and underscore the relevance of these findings in the broader context of machine learning and financial forecasting.
5. Conclusions
This research explores the use of machine learning models to forecast stock price movements in the Pakistan stock market. The study evaluates the performance of four models—Artificial Neural Network (ANN), Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Random Forest—and highlights the importance of technical indicators in making accurate predictions. Among the models, ANN and SVM stood out with an impressive accuracy rate of 85%, demonstrating their effectiveness in predicting stock price movements. The Random Forest model, with an accuracy of 84%, also proved reliable. The LSTM model, while effective, had a slightly lower accuracy of 78%.
The analysis provides valuable insights into the interaction between machine learning models and key market features. These findings are particularly useful for investors and analysts in the Pakistan stock market, offering guidance on leveraging machine learning for improved decision-making. The consistent importance of indicators like %R, Momentum, and Disparity_5 across all models emphasizes their critical role in enhancing prediction accuracy.
Looking ahead, future research could benefit from exploring hybrid models that combine the strengths of different techniques. Incorporating real-time data and sentiment analysis from news and social media could provide a more comprehensive understanding of market dynamics. Additionally, examining the impact of external factors such as economic events, political changes, and global market trends on stock price forecasts could further refine predictive models. As machine learning technology evolves, there is significant potential to improve the accuracy and adaptability of these models, offering investors and analysts more advanced tools for navigating the complexities of the stock market.
The study is limited by its focus on the Pakistan Stock Exchange, which may limit generalizability, and it relies on historical data and technical indicators, potentially missing real-time market sentiment and external factors. Additionally, it does not explore hybrid models that might enhance predictive accuracy.
Supplementary Materials: All the codes and data will be provided upon request.
Author Contributions: Hassan Raza: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Visualization and Supervision. Zafar Akhtar: Resources, Writing - Original Draft Preparation, Writing - Review & Editing, Visualization.
Funding: This research received no external funding.
Data Availability Statement: Publically available data of the KSE-100 Index is used. Codes are available upon request.
Conflicts of Interest: The authors declare no conflict of interest.
Disclaimer: All statements, viewpoints, and data featured in the publications are exclusively those of the individual author(s) and contributor(s), not of MFI and/or its editor(s). MFI and/or the editor(s) absolve themselves of any liability for harm to individuals or property that might arise from any concepts, methods, instructions, or products mentioned in the content.