by Sarah K. White

How to define the evolving role of data scientist

May 16, 2016
Big DataCareersData Architecture

Data science is a hot new career, but companies still aren't sure how to best use these employees. Here’s how to hire data scientists with a clear strategy in place.

data insight graphic

In its recently released 25 Best Jobs in America for 2016 report, Glassdoor listed data scientist as number 1 career — but it wasn’t just top of the list for tech. It topped every industry. The report cites 1,736 openings in the field, a median base salary of $115,840 and an overall job score of 4.7 out of a total 5, which are all promising stats for this quickly growing career path. But the fast-paced growth for data science jobs has been met with a severe lack of qualified candidates. And businesses that do hire data scientists often have no idea how to effectively utilize their skills.

In fact, McKinsey Global Institute looked at big data across nearly every industry and found that as of 2009, nearly every company with more than 1,000 employees in the U.S. averaged 200 TB of stored data — and that was 6 years ago. Data mining has significantly increased since 2009, and as of 2016, every tech company collects massive amounts of data on users. The study also revealed that by 2018, the U.S. could face a talent gap of 140,000 to 190,000 qualified data science workers.

Tye Rattenbury, director of data science at Trifacta has watched the role of data scientist evolve as companies figure out how to properly use these employees. Rather than hire the data scientists and figure it out later, businesses need to go into a data strategy with a clearly developed plan to get the most out of their investment, Rattenbury says.

Define the job description

The expectations of a data scientist are not only to manage data, but also to interpret data and effectively communicate it to others. But most data scientists are stuck in maintenance mode — organizing and collating data, rather than actually spending time analyzing it, according to Rattenbury. “As with all new and exciting things, there is a lot of ambiguity around what is possible and what the best practices really are. The big winners (both individual data scientists and the companies that employ them) will have the discipline to see through the hype and hone in on the activities that can and do add value,” he says.

Aaron Beach, data scientist at SendGrid, says the best approach to building a data science role or department isn’t one that bogs scientists down with information overload — but instead is built around how data needs to be analyzed for the company’s benefit. “The strategy should be defined in terms of a process for how raw data is translated into actionable information for decision makers, not in terms of which raw data is or isn’t useful,” he says.

Another way businesses can get more out of their data scientists is to focus on building the department in a way that doesn’t just reflect lofty expectations of data, but is based off the actual needs of the business. For example, a business should know before the hiring process begins how many data scientists their business will need, but that can’t be determined without first having a clear strategy that outlines what data is needed and how it needs to be translated.

“Data science is an immature, diverse and vaguely defined ‘job’. As such, it’s impossible to say how many or what kind of data scientists are needed by a company until they clearly define the job as it relates to their business. At SendGrid, we define the data science job and its career path as it relates to our product and engineering process — this helps answer the questions of how many data scientists we need and directly defines the skill set those employees will have,” says Beach.

[ Related Story: How data science is changing the energy industry ]

A coordinated approach

Data science is a new field and chances are most data scientists have a background that includes statistical analysis, domain and business expertise or coding, according to Rattenbury. But he also points out that just because they can do all of these things, doesn’t necessarily mean they should. Rather, you should focus on creating a more coordinated approach with multiple skilled people, “In most businesses, the variety of data and the variety of potential applications of data necessitate a multi-person effort that is best accomplished when people take on specialized roles,” says Rattenbury.

He says there are two places where data scientists can shine in a business, and where they should focus most of their time and energy. The first one, according to Rattenbury, is around the raw data ingestion or data creation. That means that your data scientists should focus their skills on finding the most useful way to utilize data and the best ways to store and manage that data. The second is looking at how data can benefit the company, what budgets need to be in place to achieve the business’ goals and using data to “drive automated process within the company,” Rattenbury says.

Don’t get greedy

Businesses should also avoid being data-greedy — because the idea of too much of a good thing, certainly can apply to data. “They may be collecting more data than they have the capacity to explore and assess the value of. One way to solve this problem — is to be more selective about what data you analyze,” says Rattenbury.

And because data is such a new concept in business, Rattenbury recommends a flexible approach to a data strategy — one that considers what should change as you move along with a new data initiative. This way, businesses can consider what’s working, what’s not working, who the key players are and the value tied to specific data points. However, prioritizing data this way isn’t just a task for data scientists, he says, it’s a task that needs to include everyone in the company. Data scientists can’t predict or know what data every department will need, so implementing effective data strategies need to be a company-wide task, not an individual effort.

“When a business makes explicit what can change and then asks all of its employees to engage in hypothesizing how to assess the relative value of various combinations of changes, you have effectively increased the data analysis capacity of the business. This is the crux of building a truly data driven culture,” says Rattenbury.

[ Related story: Don’t look for unicorns, build a data science team ]

A realistic approach

While it’s great to have a coordinated approach in place, it also needs to be realistic — however, as Rattenbury points out, most businesses don’t have a plan in place for the data scientists they hire. Businesses shouldn’t try to cut corners or save money when building out a data driven strategy, because data is more than just another business initiative — it’s the future of the enterprise.

For example, if your business is data-heavy, you might need to hire people dedicated to managing data, and others who are tasked with analyzing it, rather than expecting one or two scientists to do it all themselves. You may need to ultimately hire more people than you were anticipating, because data can’t be managed and analyzed by just one or two people. If you want to get the most out of your data, you need the budget, manpower and resources behind it.

This might mean separating data science from IT as well, according to Rattenbury. That doesn’t mean they should be completely separated, but rather they should work as coordinated teams, rather than the same team. “Generally speaking, it’s best if IT and dedicated data organizations don’t report in to one another. They should be peer organizations rolling up to a central organization that can coordinate their efforts,” he says.

Businesses need to understand that data isn’t a simple concept. It’s one that requires a lot of planning, dedication and resources to thrive. “Data is the key to deeper understanding. Certainly there will be laggard companies that will eventually find themselves scrambling to catch up to their peers. The key balance here is how much resourcing to put into evolving and improving your use of data versus what you need to stay competitive,” says Rattenbury.