Somewhere along the way computers transformed from filing cabinets for data into crystal balls that foretell the future by examining that data to predict what might happen in a few seconds, a few days, maybe even a few years.
Many of the tools for accomplishing this feat fall under the term “predictive analytics.” The term is a catch-all for algorithms developed over the years, from wildly different corners of statistics, artificial intelligence, machine learning and multidimensional mathematics. These tools emerged from the lab to populate corporate server farms and now they’re ready to guide business teams toward making the right decisions about allocating resources and reaping profits.
The tools have two main roles, the more obvious of which is to peer into the sea of bits in the database and pluck out some vision for the future. They do this by supporting several good algorithms with various strategic approaches; some support dozens.
The second role is less noticeable but often more time consuming. Preparing the data can be maddening because data is rarely as consistent or as clean as we need. If there are two files that need to be integrated, dates are often in differing formats using different time zones. Unifying challenges like these are easy. The more difficult ones involve missing fields or outliers that may be the result of an error — or could be an accurate omen that must be included in the data set. Removing mistakes while preserving the integrity of the data is a real challenge. All of the best tools offer good support for preparing the data and presenting the results.
Many predictive analytics tools are brand extensions built by database developers and business analytics and reporting vendors, which have slowly merged traditional report generation with AI algorithms to produce tools that both summarize and offer predictions.
Many of the tools are closely aligned with a specific data storage product. While they all work with generic formats such as CSV, they tend to work a bit better with some databases often because these databases are owned by the company that developed the predictive capabilities. Often it is easiest to go with the tool built by the same company that stores your data.
Of course, you can always migrate your data or export it in a standard format to leverage a different tool. It is often not much work to connect the pipes correctly so the bits flow relatively smoothly and the insights pop out of the end.
Here are 15 predictive analytics tools that are revolutionizing how companies are leveraging their data to make forward-thinking business decisions.
Alteryx has focused on automating the world of predictive analytics by integrating predictive algorithms into its platform for generating reports and managing workflow. The tool has a large library of data-gathering routines that can import data from a wide array of major and not-so-major sources that may be new or even decades old. The tool is highly customizable and aimed at casual, data-savvy managers instead of developers to encourage widespread enhancement of predictive technology to reporting and business intelligence. The company has also focused on delivering prebuilt solutions customized for various corporate departments, from marketing to research, to encourage faster adoption.
AWS’s tools geared toward searching for signals in data streams continue to proliferate. They are generally separated into different product lines and joined by AWS’s data storage options (generally S3 buckets). Amazon Forecast, for example, focuses on extending time-series data to predict how many sales to expect in the next quarter and how many resources you’ll need to line up in advance to satisfy that demand. Amazon Code Guru will search for bad code patterns to help improve your code. Some of the tools were built by Amazon to support its own business (Fraud Detector and Amazon Personalize), and the company is reselling these tools to others who might be building out their own ecommerce empire.
Companies that like to maintain dashboards summarizing data trends can use Board to collect data from a wide variety of data silos (ERP, SQL, etc.) and turn it into reports that summarize the past and make predictions about the future. The emphasis is on gathering data from as many sources as possible and turning each into a standardized “view” that can then be fed directly to the visualization or predictive analytics (machine learning, clustering algorithms, or pure statistical algorithms).
The Dash tool set is split into two levels: the free open source version and the enterprise system that manages a cloud of models in development or in active use. The open source version bundles together many of the best Python libraries for analysis and data visualization. The enterprise version adds Kubernetes, authentication, and several other important tools such as GPU integration for deployments that serve large groups of users. The enterprise version also includes more low-code enhancements to produce dashboards and other popular interfaces.
Companies with large data collections can use the Databricks tool set, which is built on top of Apache Spark, Delta Lake, TensorFlow and ML Flow, four popular open source projects pioneered by people working at Databricks. The company adds a collection of tools, such as collaborative notebooks and data processing pipelines, to make it easier to integrate the power into your workflow. Databricks has already developed versions integrated with AWS and Azure to simplify working with data in those clouds. One recent case study shows how Databricks helped predict maintenance problems in oil drilling before failures.
Companies looking for different options for deploying their models to either local hardware, the cloud or something more hybrid can use DataRobot to manage their data and models. The tools offer automated machine learning with a collection of routines customized for common industries such as insurance (balancing risk with pricing).
IBM’s tools come from two separate development traditions. The SPSS modeller launched in the 1960s and became a foundation for many corporations that wanted to optimize their production lines using statistics. The punch-card era code is long gone and the tool now enables non-programmers to drag and drop data with a graphical user interface to produce reports filled with statistical measures. IBM’s other big collection is bundled under the Watson brand name made famous by the Jeopardy challenge. These tools are largely based on iterative machine learning algorithms capable of taking training data and turning them into models. The code can work with raw numbers, imagery or unstructured text.
Information Builders‘ data platform enables data architects to set up a visual pipeline that collects data from sources, cleans it and then starts the analytical engines. An important set of options allows full data governance models to protect information that can’t be widely shared with all users. There are customized templates for important industries such as manufacturing and utilities that enable users to quickly develop operational insights for their business rules.
MathWorks started as a company focused on helping scientists work with large matrices with its MATLAB and it’s slowly grown to encompass many different forms of numerical analysis of data. The products in the MATLAB branch focus on optimization and statistical analysis, whereas the tools in the SIMULINK branch deliver simulation and modeling. Many groups may want one of their specialized toolboxes that customize the tools for particular markets such as autonomous cars, antenna design, or image processing. There are several dozen.
While Python began life as a scripting language similar to Perl, it’s become one of the most popular languages for data analysis in the sciences. Many research labs use Python code to analyze their results. Lately data scientists have started bundling the data, the analytical code and the written description in Jupyter notebooks, a format that produces living reports that a reader can not only read but also tweak and reanalyze. Python tools such as Jupyter notebooks, PyCharm, Spyder and IDLE are where some of the newest ideas can be found but they are often rough and best approached by software developers and data scientists. Many of the clouds now offer specialized environments for sharing Jupyter notebooks like text or spreadsheets and they make a good way of circulating predictive analysis.
This is technically just an open source language for data analysis that is largely built and supported by the academic community. While there are some good general integrated tools for using R such as R Studio, Radiant or Visual Studio, the tools are best suited for programmers and hard-core data scientists. The newest ideas from the research labs often appear first as R packages and the most hard-core data scientists like to explore them. Many of the other tools in this list make it possible to integrate R code as modules. If you can’t get what you want from the integrated tools, you can always dig deeper into the open source R modules.
The modeling tools in RapidMiner are designed to be as automated as possible so teams can create predictive models with little assistance. The development studio produces operational Jupyter notebooks with “automated model selection” and “guided data preparation.” The models are chosen from many standard choices built on principles such as classical machine learning, Bayesian logic, statistical regression or various forms of clustering. The developers have worked to avoid black boxes by adding explanations so that users can have more trust in how the models derive their results.
Many companies rely on SAP to manage their supply chains, and now SAP’s reporting tools are enhanced to offer predictive analytics to enable teams to create forecasts from machine learning models built from past data. The algorithms include both traditional artificial intelligence and simulations. The software can run locally or in the SAP cloud. The developers also aim to support the entire enterprise with customized user interfaces that can remain consistent between departments. Users running in web-based environments or mobile devices receive tuned reporting to encourage widespread adoption.
SAS Advanced Analytics
SAS’s collection of tools bundles close to two dozen different packages together into a platform that turns your SAS data into both insights and predictions. The statistical packages and data mining will focus on correlations between data elements and the optimization and prediction tools will find solutions and future directions. There’s a strong emphasis on text analytics to suss out details in unstructured text. Recently the company has been illustrating the software’s ability by showing how it can help contract tracers tract pandemics.
Tableau, acquired by Salesforce.com last year, has drawn attention for its sophisticated, artful graphical renderings of reporting information. The dashboards can now be extended by using the embedded analytics model to deliver interactive options for visual understanding. The tool depends on a rich collection of modules for gathering and preparing data for analysis.