6 tools that make data science easier

New tools are bundling data cleanup, drag-and-drop programming, and the cloud to help anyone comfortable with a spreadsheet to leverage the power of data science.

6 tools that make data science easier
Metamorworks / Getty Images

Data science may never be easy but it’s getting easier to dive in. Buzzwords like “machine learning,” “regression,” and “dimensionality reduction” are just as challenging to understand as ever, but the widespread desire to reap the benefits of these techniques has resulted in several good tools that create assembly lines for data that are ready to pump out the answers we seek.

The secret is similar to what revolutionized manufacturing. Just as standardized parts helped launch the industrial revolution, data scientists at various tools vendors have produced a collection of very powerful and very adaptive analytical routines. They’ve standardized the interfaces, making it much simpler to build your custom pipeline out of these interchangeable data science tools.

Data scientists used to wring their hands because 80 percent of the work was preparing data for analysis by crafting custom routines in Python, Java or their favorite language all so the sophisticated statistical tools in R or SASS could do their job. The marketplace is now filling with sophisticated tools that bundle together several hundred well-engineered routines into a package that does much of the repetitive and unpleasant data cleanup and standardization for you.

These new tools open the opportunity for anyone who’s comfortable working with a spreadsheet. They won’t make all prep work disappear, but they’ll make it easier. There’s less need to fuss over data formats because the tools are smart enough to do the right thing. You can often just open the file and start learning.

The tools also unlock much of the cost-saving power of the cloud. In the past, data scientists needed powerful computers to crunch big data sets. Now we can rent even bigger, faster machines in the cloud by the second, increasing processing speed while saving money by returning the hardware to the pool when the monthly reports are done.

The tools are a boon for both hardcore data scientists and data analysts who just need to train an algorithm to predict next year’s trends. Both groups can enjoy the pleasure of using sophisticated tools that do the right thing with data. The standardization, though, opens up the potential for entirely new groups to dive into data science. Now you don’t need to master R syntax or Python programming to begin.

To continue reading this article register now

Get the best of CIO ... delivered. Sign up for our FREE email newsletters!