How many statisticians do you need to build a new data model? Zero, according to Tableau Software: It says the next version of its widely used analytics tool will do it itself.
Tableau demonstrated this, and a new feature called Ask Data that allows users to create visualizations by describing what they want in natural language, at an event for customers in New Orleans last week. It also showed off new automation functions in its data preparation tool.
It’s part of a growing trend among enterprise software developers to automate or simplify tasks that once required specialized skills, allowing businesses to make more sophisticated use of their data and redeploy skilled staff to less humdrum work.
AI’s rise in BI
Advances in artificial intelligence are making it easier for enterprise software developers to take natural language input — whether spoken or typed — and infer users’ intentions, rather than obliging users to learn specific commands or to manipulate objects on-screen to achieve their goals. AI has been increasingly employed in leading BI tools, in hopes of “democratizing” analytics and data science.
Microsoft’s Power BI, a Tableau competitor, has included a feature called “Ask a question about your data” for a few years now, but in even recent demos the offering appears more finicky about grammar and spelling than Tableau’s Ask Data. Both are ahead of the likes of Dundas BI, which still uses drag-and-drop to create visualizations.
Tableau’s implementation will allow users to query a database and let the software figure out how database tables need to be joined, which columns should be selected, and what operations must to be performed to obtain the desired answer. It and the other new features will appear in Tableau 2019.1, due for release early next year, and for which the beta version was released this week.
Automation features like these are welcome — and necessary, said Forrester principal analyst Martha Bennett. “We are getting more data but the people working with it don’t have more time,” she said.
Data scientists spend up to 80 percent of their time on data preparation, she said, and the less time they spend on it, the more they can spend on things that create value.
One way around the time crunch is to hand over workloads to the machines. Another is to make it easier for people that couldn’t previously manipulate the data themselves to do so, the so-called democratization of data.
The downsides of relying on AI
But there are risks in making data available to more workers: “Data is no replacement for domain expertise and context,” she said.
Before making new automation functions widely available, CIOs should put them through their paces to see whether they’re suitable, she advised.
Tools that offer data insights without making clear recommendations may leave users confused about what action to take. “If you don’t give somebody a firm instruction, don’t expect them to get it right every time,” she said.
You can’t just hand over all responsibility to the software, though. “Automation is not the same as no supervision. These things still need to be watched,” Bennett said.
Ideally, these tools will surface an explanation of what they have done, so as to leave an audit trail.
“In a court of law, it doesn’t sound very good to say the computer did it and we have no idea why,” she warned, speaking of what is becoming known as the “black box” problem of AI.
You also need to figure out whether your data is suitable for the automation tool: Machine learning systems, in particular, need a lot of data to work with. “If you are applying machine learning algorithms to data where you have more exceptions than the norm, it’s not going to work,” she said.
At the New Orleans event Tableau’s product manager for visual analytics, Andrew Vigneault, demonstrated Ask Data on a database of crowd-funded projects at Kickstarter, showing that, in contrast to most compilers, Ask Data does not require perfect punctuation in order to work.
The software transformed his request “whats the total funding” (sic) into “sum of Funding” and returned the answer. When he typed “by year” and “by status” Ask Data transformed his request into “sum of Funding by Deadline’s year and by Status.” With no further input, it then produced a color-coded line chart showing, in green, the funding of successful projects increasing year by year, while that of failed, cancelled or suspended projects (red, orange and yellow) remained flat.
Asking “which categories were successful” prompted a different visual response: Ask Data added “by Category, filter Status to successful” to the previous query and drew a bar chart ranking Kickstarter categories by number of successful projects, in decreasing order.
Employees have long wished that enterprise software would do what they intended, rather than what they ordered, and Vigneault showed that Tableau is getting close to that. When he typed “correlate with avg fudninng” (sic) Ask Data showed him a scatter plot of number of projects against average funding for the different subcategories of technology projects he had been viewing previously.
Some things in Tableau are still quicker with a mouse, especially if your typing is slow: Adding fashion and games subcategories to the scatter plot took just four clicks.
Building new data models
A few clicks is also all it took his colleague Tyler Doyle to build a new data model, which maps the fields used by Tableau to analyze data into SQL queries that the underlying database can understand.
“I just have to click one option, ‘Add related objects,’ and there’s your data model, all without having to figure out which tables to use, how they relate, or if it’s a left or a right join. Tableau’s new data modelling capabilities just did that for you,” he said.
“How did the data model know the right relationships between those tables?” Doyle asked. It turns out that Tableau is counting on CIOs and their database admins and data stewards to help it perform this conjuring trick by ensuring that the necessary information is stored in the data warehouse.
Data preparation is another area Tableau has been working on. Senior engineering manager Zaheera Valani showed how Tableau Prep can automate data cleaning using “roles.” Tableau uses these to identify fields that fulfill a particular role — things like URLs, email addresses or geographic indications (states, say, or zip codes). Valani showed how, with just a couple clicks, Tableau Prep can inspect the contents of a field to identify the most appropriate role — then highlight the invalid items that don’t fit the role and either set them to “null” or filter those lines out. It can do the same with custom roles such as enumerated types.
Tableau Prep will be updated monthly, in contrast to the schedule of three releases a year for Tableau’s main software offering, Tableau’s Chief Product Officer Francois Ajenstat said.
Scheduling is the function of another tool the company is now beta testing: Tableau Prep Conductor. This will allow enterprises to automate preparation of their data sources, pulling them in to Tableau on a schedule they choose. It’s a separate product from Tableau, and will require a separate license when it goes on sale next year.