Introduction and Motivation
The stock price movement has always been a topic of consideration by many people. If one gets a hack to know the future prices of the stocks, he/she could become a billionaire just by sitting at home, having a cup of coffee and using the mouse clicks to invest money in some stocks.
So here’s an attempt to predict the future prices of stocks through the power of mathematical modeling, coding and combination of one and both.
- Collect the stocks’ past data.
- Using the data implement some models.
- Testing of the models on validation set.
- Use the best combination to predict the future price of the stock.
- Also capturing the market sentiment of the stock.
- Finally reaching a conclusion.
The data for a particular stock could be collected from the yahoo finance website (http://finance.yahoo.com/q?s=DATA) using the following easy steps:
- Go to the “Historical Prices” tab on the left hand side of the web page.
- Fill in the “Get Historical Prices for:” tab with the name of the stock of which you want the prediction to be carried out (For eg. ACC for Accenture).
- Set the starting and the ending date of the prices.
- Press “Download to Spreadsheet”.
And tada in less than a minute you’ll have a csv file of the data with you!
The data of ACC stock
We will just be bothered about the closing price of the stock and build our models on the basis of it. Then we will divide the dataset into train, validation and test dataset and implement the models.
- A polynomial function is one that has the form:
where n is a non-negative integer that defines the degree of the polynomial. A polynomial with a degree of 0 is simply a constant function; with a degree of 1 is a line; with a degree of 2 is a quadratic; with a degree of 3 is a cubic, and so on.
- Fits the nth degree polynomial on data points. Here n is chosen to be 5
- Breaks data into level(a), trend(b) and seasonality(s)
α=Smoothing Parameter | β=Exponential Smoothing | γ=Seasonal component
- Y[t+h] = a[t] + h * b[t] + s[t – p + 1 + (h – 1) mod p]
- a[t] = α (Y[t] – s[t-p]) + (1-α) (a[t-1] + b[t-1])
- b[t] = β (a[t] – a[t-1]) + (1-β) b[t-1]
- s[t] = γ (Y[t] – a[t]) + (1-γ) s[t-p]
Feed Forward Model
- A feed-forward neural network is an artificial neural network where connections between the units do not form a cycle. This is different from recurrent neural networks.
- The feed-forward neural network was the first and simplest type of artificial neural network devised. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.
- An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model which is fitted to time series data either to better understand the data or to predict future points in the series (forecasting).
- Non-seasonal ARIMA models are generally denoted ARIMA(p, d, q) where parameters p, d, and q are non-negative integers, p is the order of the Autoregressive model, d is the degree of differencing, and q is the order of the Moving-average model. Seasonal ARIMA models are usually denoted ARIMA(p, d, q)(P, D, Q)m, where m refers to the number of periods in each season, and the uppercase P, D, Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.
ŷT+1|T =αYT +α(1-α)yT-1 +α(1-α)2yT-2 …..
ŷT+1|T = αYT + (1-α) ŷT|T-1
α= Smoothing Parameter
Weights decrease exponentially with the increasing distance from the present time.
- The scatter plot between market and stock value is plotted. Linear regression line is fitted between the data points of market and stock price.
- The autocorrelation of stock’s price with itself is seen and tn-1 is seen to have the highest correlation.
- Hence we could predict stock’s price using tn-1.
- Correlation Table provides a linear correlation between the predicted value and real stock values.
- Higher the correlation, better the estimated values.
Integrating the Models:
Weighted Average of Predicted Models
- Applying different set of weights to the selected models. Calculation of Chi-square value and selecting appropriate weight for the prediction.
- The isotonic regression finds a non-decreasing approximation of a function while minimizing the mean squared error on the training data.
- Chi-squared = (observed-expected)2/(expected)
- Lower the value of Chi-square higher the chances of success.
AS CAN BE SEEN THE WEIGHTED AVERAGE OF PREDICTED MODEL IS THE BEST PREDICTOR.
News Sentiment Analysis:
Many key factors that influence stock price of a company or a sector of the economy are also affected by the incoming news articles and feeds. Incoming news can be of various types – such as latest earnings statements, announcement of dividends by a company, information about new products, and trend analysis and prediction by financial experts. Figure 1 shows some of the factors that affect the stock price. Clearly, this is an area in which text and data mining tools and techniques can be employed to provide summary information by extracting important keywords and action phrases from the incoming news stories. More importantly, there is need to find ways to find emotion and sentiment from this corpus of text. With the advent of online news sites such as Google Finance, Yahoo Finance and MSN Money, financial news is delivered in real-time, streaming format. There is a critical need for tools that can provide an accurate estimate of sentiment from streaming news items.
Figure1: A snippet from a news article carrying different sentiments for two companies
First step is to create on-line news gathering engine using package tm and its various plugins such as tm.plugin.webmining.
Figure2: R code snippet
Analyzing and Filtering News Items
Our method starts off by scanning the news items for the stock symbols in question. As an example, Figure 1 shows a snippet of a news article from Bloomberg. It talks about Apple (AAPL) and Coinstar (CSTR) in the same article, yet the article conveys different sentiment for each. The proposed method filters the article to only relevant sentences. It would thus take only the first paragraph into consideration for Apple Corporation (AAPL). Some of the key words and phrases that would be tagged for polarity would be “sank 2.8 percent”, “fell”, “losing streak” (negative), and “valuable” (positive).
For breaking a story down to sentences as shown in figure3, we have various Natural Language Processing (NLP) tools available therein, such as the package NLP. This allows us to look at individual sentences and keep only those that contain the stock symbol.
Figure 3: R code snippet
Identifying Sentiment Words
We define an instance to be a unit of the granularity levels – such as sentences, headlines, paragraphs at which the analysis is to be performed. After extracting the relevant instances, we identify the key words contained in them and match them against available sources of positive or negative sentiment terms. We have used Harvard General Inquirer.
Scoring Text Corpus
News articles are kept in memory in the form of a document-term matrix with the instances as rows and the terms as columns. We create scores for the instances.
Defintion1: An instance is classified as positive if the count of positive words is greater than or equal to the count of negative words. Similarly, an instance is negative if the count of negative words is greater than the count of positive words.
Definition 2: Score of a corpus is defined as the ratio of positive instances to the total number of instances.
When we apply this algorithm to news streams gathered on November 14th ’15 for Accenture we get:
- 23 negative terms occurring in the sentences
- 107 positive terms occurring in the sentences
- 89 sentences with sign +1
- 10 sentences with sign -1
- A naive sentiment score of 89 / (89 + 10) = 89 / 99 ≈ 89.89%
Figure 4: R code snippet
In this section, we will present the results of our method. The clouds to the left show the positive and negative words in the news streams respectively. They lead to a naive sentiment score of 89.89% representing a positive net sentiment for ACCENTURE.
Figure 5: Positive and Negative clouds
The choice of models is done depending on the training dataset. For Accenture stock that we have selected, Weighted average model turns up to be the best choice. It may vary for different stocks.
The sentiments of the market must also be checked while investing as strong sentiments also tend to have an impact on the stock selection.
For further queries you might contact:
Ankit Sonthalia (15BM6JP06) +91 86000 96336
Loveleen Kaur (15BM6JP21) +91 99153 66173
Siddhant Sanjeev (15BM6JP45) +91 98109 32592
And a last word of advice:
INVEST AT YOUR OWN RISK