"What exactly is a data scientist?" This question is increasingly on the minds of many in the tech world today. The mix of skills needed to be called a "data scientist" is still very much a work in progress—however, the role may not be as new as you think.
A recent Harvard Business Review article made more than a few headlines with its own headline: Data Scientist: The Sexiest Job of the 21st Century. It would have been hard to image the words "data," "scientist" and "sexy" in the same sentence, let alone a headline, even a few short years ago (namely because the title hadn't been coined yet, according the article). Now it's making it into it one of the country's top business magazines. That should speak volumes about the big data ride that businesses big and small are about to take.
Data Science and Today's College Classrooms
Sensing an opportunity, universities have taken notice. Programs are underway at schools across the country to address what has quickly become a booming demand for people that understand advanced analytics and statistics, have solid programming skills and "get it" when it comes to the day-to-day realities of the businesses they find themselves in.
Columbia University has put together its first course with "data science" in the title. In July, the school launched the Institute for Data Sciences and Engineering, according to instructor and course creator Rachel Schutt, a senior statistician at Google and an adjunct assistant professor in the Statistics Department.
"I kept hearing from data scientists in industry that you can't teach data science in a classroom or university setting—and I took that on as a challenge," Schutt says in a blog post she wrote in response to questions for this article. "This course creates an opportunity to develop the theory of data science and to formalize it as a legitimate science."
In addition, Cloudera Chief Scientist Jeff Hammerbacher, formerly head of Facebook's data team, and University of California at Berkeley computer science professor Mike Franklin taught an Introduction to Data Science course this past spring.
A quick Google search uncovered a couple listing for schools ranging from Stanford and Stevens to Harvard (fall 2013) and the University of Cincinnati that offered "data scientist" courses. Few, though, use the term data scientist. Most are billed as advanced analytics degrees. This is appropriate; the focus of the job, from a business standpoint, is gleaning actionable insights from data that the business can use to turn a profit, not just play with.
"[For] most companies, their biggest challenge isn't going out and hiring someone who can do segmentation, or clustering or statistical analysis using tools from SAS," says Shawn Blevins, executive vice president and general manager of sales at big data as a service provider (BDaaS) Opera Solutions. "It's the fact that that's a disconnected activity from the rest of the business."
What companies want in a "data scientist," then, is the mix of skills that will lead to better understanding of the massive volume and variety of data that is now available for analysis because of tools such as Hadoop and R. "It's this idea of operationalizing [data], putting domain expertise with it and, frankly, calling [BS] on it because it doesn't result in profit," Blevins says.
Data Scientist Jobs Gaining Ground
A search of job boards reveals that companies do want to hire data scientists—while Monster.com listed just 49 openings, Dice had 224 jobs and LinkedIn showed 477 positions. LinkedIn searches for "DBA" and "system administrator" showed 764 and 1,827 positions, respectively, but the data scientist role is gaining ground.
Of course, big data is the reason this job is even on the radar. It's not that people haven't been working with big data sets in the past, or that the idea of big data is new. After all, the three "Vs"—volume, velocity and variety—coined by Gartner's Doug Laney more than 10 years ago still make up the definition of big data today. Companies today are finding that there really isn't any one person in their organization who can deal with all three Vs and put them into a business context.
Given that the primary goal behind any big data project today is a better understanding of your customers—how they interact with your company, its products and what they want going forward—the skills of a Ph.D. statistician doing regression analysis is just a subset of the skills a full-blown data scientist will be expected to know, says Herain Oberoi, a director in the Business Platform Group at Microsoft.
"The title is definitely new. The data scientist role is not," Oberoi says. "It's part of a continuum. What's happened in the past few years is new technologies like Hadoop, that enables cheap distributed processing and improved capabilities and the ability to do things like statistical programming, [have] become easier, so the bar from getting insights from new types of data has come down."
This means specialists skills are no longer needed to glean specialist insights, at least in the discovery and modeling phases of finding the little nuggets of knowledge that lead to innovative products and services. Those nuggets exist in the massive data streams and data sets now open for examination, says Paul Barth, co-founder and managing partner of big data consultancy New Vantage Partners.
Analysis: Desperately Seeking Data Scientists
"It's going to be a lot different compared to today, where you throw your questions over a wall and wait six weeks for an answer and then have to say, 'No, that's not what I asked,'" Barth says.
Big data analysts, who are the forerunners of and most likely candidates for the data scientist title today, let companies ask and answer questions in quick succession, significantly shortening the mean time-to-answer and thus bringing the power of Moore's Law and analytics to the average business user.
"What kind of person does all this?" Thomas Davenport and D.J. Patil ask in their Harvard Business Review article. "What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator and trusted adviser. The combination is extremely powerful—and rare."
Allen Bernard is a Columbus, Ohio, writer. He has covered IT management and the integration of technology into the enterprise since 2000. You can reach Bernard via email or follow him on Twitter @allen_bernard1. Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.