Data scientists come at a premium, posing a challenge for any enterprise outside of Google, Facebook, Amazon.com and Apple. CIOs who have been fortunate enough to poach them from big tech companies or lure them from academia beam with pride as they talk about all of the business insights they’re going to generate with their data gurus.
IBM expects the demand for data scientists to soar by 28 percent by 2020 — and that figure may be conservative. To address the talent shortage, companies are building software that does that heavy lifting for companies, effectively creating “citizen” data scientists out of corporate employees that are not embedded in IT.
Citizen data science includes capabilities and practices that allows users to extract predictive and prescriptive insights from data while working in positions outside the fields of statistics and analytics, according to research firm Gartner. Citizen data scientists are “power users,” such as business analysts who don’t have computer science backgrounds but can perform simple to moderately sophisticated analytical tasks that would previously have required more expertise, says Gartner analyst Carlie Idoine, in a blog post. She adds that power users, such as business analysts, can help mitigate the current skills gap.
“The increased availability of tools, tech, data and models is enabling the dissemination of insights to people who normally would not have the capabilities to get at themselves,” says Forrester Research analyst Brandon Purcell.
Data science democratized for (almost) all
Technology always finds a way to democratize access to information. So what’s changed? In the traditional model — still practiced by most enterprises — business analysts hunker down with someone from IT and a data scientist for months to plan models intended to generate predictive insights, with the data scientist often building the model from scratch.
Now, thanks to tools such as IBM’s SPSS and Alteryx, citizen data scientists, many with no or minimal coding experience, drag and drop data models onto a sort of software canvas to derive insights. Such tools make it “much easier for line-of-business analysts to manipulate data than in Excel,” Purcell says.
General Motors, for example, built Maxis, an analytics platform that allows business users to conduct Google-like queries to gain a window into sales forecasts and operational metrics such as supply chain performance. GM may be an outlier now, but it’s going to have a lot of company in a short order, experts agree.
Data science is a critical focus for oil giant Shell, where employees churn through the company’s petabytes of data to generate operational and business insights. Thanks to self-service software, employees who otherwise might not have been able to tap into analytics can now do so without technical help, says Daniel Jeavons, general manager of Shell’s data science center of excellence. For example, Shell uses self-service software from Alteryx to help run predictive models that anticipate when thousands of oil drilling machine parts might fail.
“Data science tools are democratizing the low-end of data science, so more or less anyone can do it,” Jeavons says. But at the other end of the spectrum Shell uses “powerful engines” such as Google TensorFlow and the deep learning library MXNet, as well as Python and R programming languages. “There will always be a spectrum spanning the citizen data scientist and the professional data scientist and we have to support both.”
Rather, the citizen data scientist does bridge the gap between self-service analytics conducted by business users and advanced analytics attributed to data scientists. Professional data scientists build and scale data models and algorithms across an entire enterprise, says Forrester’s Purcell.
Possessed by the now widely-held maxim that data is the new oil, many enterprises have become “seduced by the glamor of complex analytics,” says Joe DosSantos, TD Bank Group’s senior vice president of enterprise information. The reality is that data science is no longer about wizards and mythical unicorns.
TD Bank uses a wide range of basic to sophisticated analytical tools to better align historical and current customer data, as well as to conduct fraud analytics, DosSantos says. For instance, the bank uses software from AtScale to help business users query live data from the bank’s Hadoop data lake and rapidly get results. TD Bank analysts view the data in self-service visualization software from Tableau.
Data scientists: Still wanted
Other software vendors are accelerating the data democratization trend, often employing machine learning (ML) and artificial intelligence (AI) capabilities to build automated models.
Salesforce.com, for example, offers Einstein Prediction Builder, which allows business analysts to create custom AI models, adding variables on any custom Salesforce field or object to predict outcomes such as a customer’s likelihood to churn or the lifetime value of an account. Adobe’s Sensei, another ML software tool, helps marketers whip up marketing campaigns in minutes, shaving hours off of the task.
More than 40 percent of data science tasks are likely to be automated by 2020, Gartner says. “This [automated ML approach] is the next generation of data science,” says Purcell.
Of course, not every big data challenge is easily tackled by a citizen data scientist. Companies still need statisticians, data scientists, actuarials and other experts versed in advanced math techniques, says Bill Roberts, managing director of Deloitte Consulting’s cognitive and analytics practice. Such specialists can fill the gaps and missing fields in data, tasks for which citizen data scientists are ill-suited.
Moreover, Roberts notes that while self-service tools can serve an enterprise well if they work correctly, what if they don’t? What if something goes wrong and the math doesn’t check out? Perhaps there is a problem with the algorithm itself. “When there’s a jam or a problem, you need somebody with some training or advanced degree that can address that,” Roberts says.