12 data science mistakes to avoid

Well-managed analytics initiatives can reap organizational gold. But succumb to one of these common mistakes, and your data science operations can quickly go wrong.

12 data science mistakes to avoid

AI, machine learning and analytics aren't just the latest buzzwords; organizations large and small are looking at AI tools and services in hopes of improving business processes, customer support and decision making with big data, predictive analytics and automated algorithmic systems. IDC predicts that 75 percent of enterprise and ISV developers will use AI or machine learning in at least one of their applications in 2018.

But expertise in data science isn’t nearly as widespread as the interest in using data to make decisions and improve results. If your business is just getting started with data science, here are some common mistakes that you’ll want to avoid making.

1. Assuming your data is ready to use — and all you need

You need to check both the quality and volume of the data you’ve collected and are planning to use. “The majority of your time, often 80 percent of your time, is going to be spent getting and cleaning data,” says Jonathan Ortiz, data scientist and knowledge engineer at data.world. “That’s assuming that you’re even tracking what you need to be tracking for a data scientist to do their work.”

If you’re tracking the right data, you might not be recording it correctly, or the way you record it might have changed over time, or the systems you’ve collected it from might have changed while you were collecting data. “If there are incremental changes from month to month, then you can’t use that entire month of data when you perform an analysis or build a model,” cautions Ortiz, because the system itself has changed.

Even if you’re collecting the right data, low data volumes and large numbers of independent variables make it hard to create predictive models for business areas like B2B marketing and sales, explains John Steinert, chief marketing officer at TechTarget. “Data science gets better and better the more data you have; predictive models are more powerful the more data you have. Because transaction rates are low and independent variable affecting transactions are many, you’ve got small data sets and complex interactions and these weaken the power of predictive models.”

One option is to buy data sets like purchase-intent data, as long as you can find one that applies to your business segment. Another is to simulate the data, but that must be done carefully, warns Chintan Shah, senior consultant data scientist at Avanade. “In reality, the data may not behave according to the assumption you made in the beginning,” Shah says.

2. Not exploring your data set before starting work

You may have theories and intuitions about what your data set will show, but data teams should take the time to look into data in detail before using it to train a data model.

“If you see something counterintuitive it's possible that your assumptions are incorrect or that the data is,” Ortiz says. “The most important thing I do is simply looking at the data, plotting it and doing exploratory analysis. A lot of people go through that too quickly or bypass it altogether but you need to understand what the data looks like. You can ascertain whether the data is telling you the proper story based on subject matter expertise and business acumen more quickly by doing some exploration beforehand.”

1 2 Page 1
Page 1 of 2
Download CIO's Roadmap Report: Data and analytics at scale