In 2013, The McKinsey Global Institute (MGI), published a report that by 2018 the United States will experience a shortage of 190,000 skilled data scientists, and 1.5 million managers and analysts capable of reaping actionable insights from the big data deluge. This came after a 2012 Harvard Business Review (HBR) study declared that the“data scientist” was the sexiest job of the 21st Century (one of the authors of the paper, Dr. DJ Patil, was named Chief Data Scientist for the nation by the White House in February 2015 — which is actually quite a big deal for healthcare). This was largely based on the notion that data scientists – people with a unique combination of statistics training, technology skills, and data knowledge — would be required in large numbers for the promise of analytics to be fulfilled. All this has whipped up a frenzy of hiring in the last couple of years, with compensation packages going through the roof.
The shortage of data sciences talent in the market is one of the factors holding up adoption rates for advanced analytics in healthcare. What made data scientists valuable was their seemingly esoteric knowledge of statistics — the ability to apply statistical models to business problems and analyze data to come up with predictions on a wide range of issues from machine failures to consumer behavior to hospital readmissions.
Forrester Research has recently published the Forrester Wave report on big data predictive analytics solutions, which declares that predictive analytics is within easy reach for all enterprises if they choose the right big data predictive analytics solution to meet their needs. The implied suggestion to enterprises is clear: don’t bother with building an army of data scientists, because the predictive models you’re looking for is available in a box. The report has identified over a dozen predictive analytics vendors who are providing out-of-the box tools that can appeal to business users who neither have the statistical modeling skills nor the ability to hire highly qualified data scientists.
Related developments in the market suggest a “democratization” of predictive analytics:
–Coursera, a big online education service, offers courses that teach the uninitiated how to develop predictive models using R, an open source platform that is rapidly becoming the de facto standard for teaching and practicing predictive modeling in universities and institutions. These are free courses that enable legions of students to learn advanced analytics modeling at zero or minimal cost. We have seen this movie before – an acute shortage of programmers is being addressed by communities such as Code Academy that are offering free online courses on in-demand skills like python programming language to individuals with non-technical backgrounds.
–Academic medical institutions, such as Mayo Clinic, are making their research available to the larger community by publishing and sharing their algorithms with healthcare enterprises to improve patient outcomes in a range of areas such as cardiovascular health. The unlocking of this knowledge, hitherto residing within research departments of academic medical centers, could have significant impact on the widespread adoption of predictive modeling in healthcare if other medical institutions follow suit.
–The emergence of predictive modeling markup language (PMML), an XML-based file format provides a way for applications to describe and exchange models produced by data mining and machine learning algorithms, makes it easier to operationalize analytics and integrate them into workflow and applications.
So does this all mean doom and gloom for data scientists and applied math experts? I would argue to the contrary, for the following reasons:
–Predictive models can’t just be dragged and dropped at will from one environment to another. As an example, a model that predicts the risk of readmissions in a hospital that caters primarily to Medicare patients in an affluent neighborhood cannot predict readmissions with the same accuracy as another that treats Medicaid patients in a poor neighborhood. This is where a skilled modeler will adjust the variables in the models to reflect the differences in population characteristics.
–The data modeling, integration and visualization skills required to bring the power of predictive models to life are still going to be required, regardless of how accurate or reusable the models are. At the end of the day, predictive models are just one component in a value chain where the goal is improved outcomes that can be achieved through real-time, actionable insights that are an integral part of workflow and operations.
–Developing or deploying predictive models is only the first step in a long-term journey. The full benefits of these models accrue over time as the models are refined with data gathered over a long period of time. In addition to machine learning, this will require expert human intervention. This is no different from the actual practice of medicine, where the physician’s intuition, experience and judgment cannot be replaced by a model – even though the models may improve the ability to quickly diagnose conditions and determine interventions.
If predictive models are indeed getting democratized, I would argue it’s a good thing because it removes bottlenecks to adoption of advanced analytics to drive business outcomes. It also demystifies and puts the power of prediction in the hands of a much larger segment of the business community which is where it can do the most good.