“Big data is not about the data.*”
― Gary King, Harvard University
Machine Learning. Deep Learning. Data Science. Artificial Intelligence. Big Data.
Not a day goes by that one or all of these buzzwords stream past in our business news feeds.
Data analytics has become mainstream. And you better jump on board or risk being left at the station!
Just within the last year or so, searches of these topics have taken off. In fact, according to Google, in early 2017, search interest in one of these topics, machine learning, has eclipsed that of big data:
So, how do time series methods for forecasting fit into the taxonomy that currently defines the data science field?
Data science taxonomy
Key data science terms that are related to time series methods for forecasting are data mining, predictive analytics, machine learning (supervised and unsupervised), regression, structured and unstructured data.
These are not necessarily mutually exclusive. At the risk of incurring the wrath of the data science gods, here is our simplification:
Structured vs. unstructured data
Structured data are organized into “rows and columns” (spreadsheet); unstructured data are not (text in a book).
Time series methods use structured data.
Data mining seeks to find patterns in data, whether structured or unstructured.
Time series methods seek to find patterns that repeat over time.
Predictive analytics seeks to find a relationship between a variable of interest (e.g. customer churn) and multiple dimensions (e.g. age, length of contract, zip code). These dimensions can be used to predict the likelihood of a customer churning (in our example).
Typically, predictive analytics is not based on time series data but “cross-sectional” data like a customer set. Additionally, time series methods use only a very limited set of dimensions, the primary one being past behavior of the variable being forecasted (e.g. sales).
Time series methods typically use the past behavior of the variable being forecasted as the primary dimension.
Machine learning means that a computer is using a program (algorithm) to “connect the dots” in the data. If you run a regression model in Excel you are engaging in machine learning.
However, supervised machine learning does not mean you are keeping watch over Excel as it does its stuff!
Supervised machine learning means that the computer is seeking to find a relationship between a single variable (e.g. churn) and many dimensional variables (e.g. age, length of contract, zip code).
Unsupervised machine learning means that the computer is seeking to find a relationship between many dimensions (e.g. age, length of contract, zip code) so that customers can, for example, be clustered into a small number of groups or tribes with similar characteristics.
Time series methods are a type of supervised machine learning since they attempt to find a relationship between present and past behavior.
Regression is one way a machine finds relationships between a single variable and a few (or many) dimensional variables or past values of the variable itself. There are several flavors of regression.
So, when you use time series methods for forecasting you are probably mining structured data using supervised, regression- or maximum likelihood-based, machine learning.