AI Puts Data Science – and Marketing – On Steroids

foley cio2

Mike Foley serves as Senior Director of Data Science at Pure Storage and is an accredited professional statistician who holds several data science and technical graduate degrees. He has more than 20 years of marketing experience involving predictive analytics, promotions, direct/database marketing, strategic planning, and marketing research. Mike previously served as Senior Director of Data Science at both Dell and EMC.

IDG recently sat down with Mike Foley, Senior Director, Data Science at Pure Storage®, to discuss how artificial intelligence (AI) is affecting the discipline of data science. Foley also explained how AI can enhance corporate marketing initiatives.

How are advancements in AI impacting the field of data science?

Theoretically, machine-learning and AI have been around for a while, but we didn’t previously have the compute, storage, infrastructure and data to really make it practical to implement. That includes being able to store large volumes of structured and unstructured data in technology such as our FlashBlade™.

One advantage of AI is that it allows you to combine disparate sources of data, including everything from social media to sensors, to get a more complete view to improve prediction and decision support. In many cases you can get higher predictive accuracy by blending a wide variety of data, but the data may be so noisy and dimensional that you can’t build predictive models using classical statistics and business rules. You have to use AI to find the patterns to enable predictions.  

The benefits of using AI against large and diverse data sets were apparent to 2,300 global business and IT leaders that MIT Technology Review surveyed in partnership with Pure Storage. For example, almost 90% of them agreed that a wealth of data would help them better tailor their customer experience.

In what other ways is AI changing the data modeling process?

When using classical statistics for predictive modeling, you must be very efficient and start by doing a lot of exploratory data analysis to select the most useful variables, and then do a lot of work to prepare that data. Now with these new kinds of machine-learning models, you don’t have to do a lot of up-front variable selection. Instead, you’re able to leverage many overlapping or correlated features (each contributing a small amount to improving predictive accuracy) by running “ensembles” of hundreds to thousands of models sequentially to allow each “little learner” model to learn from the previous models. This results in the highest predictive accuracy with big data.

For example, as statisticians we use a technique called classification trees. In the past, you’d develop a single tree, but with AI and machine-learning we can use ensemble techniques such as random forest that can involve hundreds of decision trees. You generate individual predictions from hundreds of models, which when combined and averaged can improve predictive accuracy.

How are data scientists able to leverage AI to improve marketing insights and decisions?

Many of the concepts and techniques are the same, but it’s kind of like putting them on steroids. For example, people have used techniques like association rules and collaborative filtering to recommend the next likely purchase – what used to be called market-basket analysis. We still use that today -- but we have more data and compute horsepower. Now it’s not just looking at what’s in a shopping cart, but at everything you know about a person – from what they’re saying on Twitter to sensor data that the products themselves may be producing as they are being used. 

Do you do most of your AI-focused work in-house, or do you sometimes tap of the AI services now available from cloud providers?

If you’re a small company and you don’t really have the talent, commodity AI modelers are a good way to get to the mean. But to be an analytic competitor, you want to go beyond the mean when you can. That’s where your in-house team comes in.

For us, forecasting our sales pipeline is very important, so we do that in-house because we can outperform the commodity modelers. We still use some of those vendors for a wide range of day-to-day stuff, but we have access to data and domain expertise that someone from outside is not going to have. Still, we don’t have endless resources and we can’t do everything. So, we have to pick and choose what to do in-house.

For more information on Pure Storage and its data management and AI-enablement capabilities, visit www.purestorage.com/evolution. 

Copyright © 2019 IDG Communications, Inc.