kentoh - Fotolia
How predictive analytics techniques and processes work
Predictive analytics is no longer confined to highly skilled data scientists. But other users need to understand what it involves before they start building models.
The ability to accurately predict customer behavior, market trends and other business events has long been considered a holy grail for data analysts. But the community of people trying to get their hands on that metaphorical grail has grown significantly.
Twenty years ago, predictive analysis was limited to "algorithmists" and other skilled analysts who were intimately familiar with the statistical methods that are the foundation of analytics applications. Today, predictive analytics techniques and tools have matured to the point where predictive models can be easily developed and deployed within business processes -- and not only by actual data scientists.
Increased functionality and easier accessibility open up analytics applications to so-called citizen data scientists -- business analysts and users who have enough know-how to build models on their own. But to be successful at it, they need at least a high-level understanding of the processes and techniques used in predictive analytics applications.
What predictive analytics is
Fundamentally, the objective of predictive analytics is to analyze historical or current data to develop models that can be used to forecast future actions, behaviors and outcomes. Statistical techniques are applied to data sets through the use of advanced algorithms to weigh different variables and score the likelihood that particular things will happen -- for example, whether existing customers are likely to continue buying products from a company.
Big data systems built around Hadoop and related technologies are often used to fuel data mining and predictive analytics efforts. Machine learning algorithms can help automate the data analysis work; on a larger scale, deep learning tools enable the use of neural networks to do predictive analytics on massive volumes of structured or unstructured data. The resulting predictive models can be integrated with operational applications to influence business decision-making and, if done effectively, drive higher revenues and profits.
Predictive analytics initiatives are supported by two pillars: a well-defined process that standardizes how data analysts develop, test and deploy predictive models and a set of predictive analytics techniques and tools to use in doing the analysis work.
Finding virtue in the analytics process
The predictive analytics process should embody what can be seen as a virtuous cycle. A business problem is identified, and candidate resolutions for the problem are developed and compared to find the ones that work the best, which are deployed in the operational environment. Business improvements are then measured, and the process begins again with a look for the next opportunity.
An actual example of this process in action would also include some other crucial steps. After identifying a business issue that might be positively affected by the creation of a predictive model, analysts must determine the data to be used. To do so, they need to consider what data sets are likely to inform the development of the planned model and assess the availability and accessibility of the required data.
Once the data to be analyzed is collected, the next step is preparing it for analysis. To do so, an analyst standardizes and cleanses the data, a task that includes inputting missing values, eliminating outliers that aren't expected to beneficially impact the outcome, and organizing the data to support and streamline the analysis stage.
Using predictive analytics tools, a sample data set is then subjected to one or more statistical algorithms to create trial models for testing. After being "trained" on the sample data, the models are applied to the full data set and evaluated to see which ones best fit the data and how well each produces the desired analytics outcomes.
The final task is to embed the chosen model -- or models -- in business applications and processes to support decision-making and strategic planning. And then it's time to return to the first step and start the process again for a new analysis effort.
Algorithms fuel analytics techniques
The statistical analysis and model development process potentially comprises myriad types of algorithms that employ various analytics methods. These algorithms are intended to isolate dependencies among different variables in the data and to determine where there is a high probability of confidence in the predictions that can be derived from the dependencies.
Yet, while there are many different algorithms available to use, there's a smaller set of fundamental predictive analytics techniques that typically are applied, including the following:
- Description. This technique summarizes what has happened in the past and attempts to analyze and characterize it, with an eye toward predicting similar events in the future. Describing past behavior and then applying predictive models to the resulting data helps to frame opportunities for operational improvement and identify new business opportunities.
- Correlation. Users can do correlation analysis to identify relationships and dependencies between different data variables to predict how they'll affect one another going forward. Correlations can be positive or negative. Determining that there's no correlation between a set of variables can also be useful in targeting predictive analytics projects at meaningful data.
- Segmentation. This technique is a way to analyze a large collection of entity data, such as a customer database, and organize it into smaller groups. All the entities that are collected into the same subgroup are determined to be similar to each other on the specified characteristics, which lends itself to predicting future behavior or events.
- Classification. Another means of separating different entities in a data set into related groups is to map them into predefined categories based on relevant characteristics or behaviors. The resulting classification model can be used both to categorize new records and to do predictive modeling against the data for the designated subgroups.
- Regression. This technique is designed to identify meaningful relationships among data variables, specifically looking at the connections between a dependent variable and other factors that may or may not affect it. The information enables analysts to predict future developments related to the dependent variable based on what happens with related factors.
- Association. One more technique for highlighting relationships between data elements for predictive purposes is to look for ones that demonstrate affinity -- for example, products that often are purchased together.
When armed with a comprehensive arsenal of algorithms and tools, data analysts of all stripes can combine these predictive analytics techniques to develop a variety of models -- then test, validate, compare and deploy them to generate actionable information that helps optimize critical business operations and point to new business strategies.