In the real world the events effect on each other. If a president of a big country says something important that is a “news” and can trigger other events. If something happens in the world related to a stock, the price of that stock will change. We can describe these events with time series data when we assign specific values to the given times.
If we could learn all of the past data patterns from time series can we predict better the future data? If the index and commodity prices depend on external conditions does the external environment affect e.g. the sales of a given company? And if we know this external data, can we forecast the future of your business data more precisely?
Let’s look into how machine learning came to be with support from big databases in the short history of artificial intelligence.
WordNet is a lexical database for the English language for more than 150k words that are linked together by their semantic relationships. It is like a supercharged dictionary/thesaurus with a graph structure. WordNet uses synsets to describe the meaning of a word and this database is very useful if you develop artificially intelligent software with purpose of text analysis.
ImageNet came from the WordNet idea, but this is instead a huge image database, with more than 14 million images, designed for use by visual recognition software developers.
TimeNet is a database for time series. Economical and geographical data is stored and are refreshed on a daily basis in this database. This data can describe the external economic environment and can support our endeavors to find correlation between the data of a specific time series and external conditions.
How fascinating that the future preprogrammed in our past and based on past patterns we can predict the future outcomes. If you are a sci-fi fan you could realize that this can be the new Foundation. If Asimov would now write the Foundation then Hari Sheldon would be a data scientist not a mathematician. The future can be modeled with the math but the solution isn’t the psychohistorical-dialectical equations but raw data.
In this research project the time series data were continuously collecting and evaluating from wide variety of areas. Storing is optimized for the quick search possibilities and very often there is a location parameter which gives one more dimension to the equation (hurricane time series or even economical data needs the geographical location parameter).
We are using traditional correlation techniques and we are developing new methods for testing the similarities between time series.
The TimeNet.cloud site uses standard mathematical correlation on the Correlations part. Before calculating the correlation coefficient it is important to make sure we have clean data and to not use the dates which does not have value in both time series. After that we use the numpy python library’s corrcoef method, which give us back a matrix of the correlation coefficients. This method calculate the covariance of the two series and divide it by the product of their standard deviations. These coefficients are between -1 and 1. 1 and -1 means the strongest possible agreement, and 0 means the possible strongest disagreement. (Below 0 is the same agreement as the absolute value of it, but in the opposite direction.) We multiply these coefficients by 100, so we scale it between -100 and 100. It is means the same, but a bit more representative shoving it as percentage.
On the site there is a Trend Correlations part which uses a uniquely developed method to analyze how likely two time series changes trends at the same time in the same direction. Before the algorithm we use the same techniques to get a clean data as we used before the standard correlation. Then we try to detect the points where a trend change happens. To this we use the Ramer–Douglas–Peucker algorithm to simplify the series. After we have a similar serie but with fewer points we save the points where the angle between the previous and the next point is too big. These points are our trend changing points. From these points we get where uptrends, downtrends or stagnating periods are in the time series. Then we can use standard correlation to see how similar are these trend periods in the two time series.