Humans losing jobs to robots has been the preoccupation of economists and sci-fi writers alike for almost 100 years. AI systems are the next perceived threat to human jobs, but which jobs? Sourcing the logic from numerous open-source packages or paid API services, connecting disparate datasets, and maintaining a pipeline are complex tasks that AIs are ill-suited to do at present.
AI and the data pipeline
A well set up data pipeline is a thing of beauty, seamlessly connecting multiple datasets to a business intelligence tool to allow clients, internal teams, and other stakeholders to perform complex analysis and get the most out of their data.
Data engineers thrive on interesting challenges: bringing terabytes of data from wherever it lives to where it can be analysed, transforming it using various libraries and services, and keeping the pipeline stable. However, the data preparation phase of the whole process poses its own issues. It can be a creative process, and it’s certainly necessary, but saving and automating the repetitive usage of the logic every X amount of hours is a challenge. Today, the way to solve this challenge is by bringing in artificial intelligence and machine learning.
Augmented analytics is the next iteration of business intelligence, where AI elements are incorporated into every phase of the BI process. The powerful AI analytics systems emerging today have AI assisting users in a broad range of ways, but we’ll stay focused on data prep for this article.
Three sections of the data preparation process where AI can help that we’ll discuss are data cleaning and transformation, extracting and loading, and verifying the prepared data.
Clean as you go
The saying “data is the new oil” gets tossed around enough to have already become a cliche, but for purposes of our discussion it’s an especially apt metaphor. Most companies are sitting on huge stores of data, but in its unprocessed form, it’s not very useful. Even worse, analysing non-normalised data boils down to potentially harmful and misleading results. To continue with the oil metaphor, you need a stable and reliable pipeline to take your data from where it’s stored to where it’ll be processed so its true value can be harnessed.
While you’re moving that data, data engineers have the ability to digest it so it’s closer to being in a usable state by the time it hits the BI system. BI platforms are already using AI to help with the data cleansing process in a variety of ways. Let’s walk through how AI can assist you:
- AI assistance can recommend a date model structure, including which columns to join, which to compound, and maybe even create dimension tables to facilitate the fact table joins.
- AI systems can apply simple rulesets to help standardise the data by doing things like making all text lowercase and removing blank spaces before and after values.
- If you already have a perfectly formatted dataset to use as a learning dataset, AI assistance can even be trained on this to recognise how the larger dataset should look, allowing it to take a holistic approach to cleansing, rather than you telling it specific tasks to do.
- As AI assistance learns how you want your data to look, the system can even scan all the columns and make recommendations as to what to fix, implement active learning, or go ahead and fix errors on its own, such as removing redundant records (deduplication caused by misspelling, for example) or using context clues to fill in missing values.
Extracting and loading
The rise of cloud data warehouses has changed the way companies treat their data. In the past, well-organised databases were needed to keep records in order. Today, data comes from a wide array of different sources and in a variety of different forms, from user-generated to sensory data. More and more frequently we even witness companies using third-party data to enrich their business logic (how the weather forecast will affect my sales?).
This change coincided with an increase in the sophistication of AI data analytics systems, allowing them to deal with data in all its types, structured (numerical) and unstructured (text, image, video). Data storage on cloud warehouses like Redshift is so cheap and there can often be different roles responsible for data gathering and storage, so rather than worry about how everything is formatted, companies just pump everything into the warehouse, however it’s formatted, and deal with it later.
This is another place where BI with AI has a chance to shine, extracting the data, performing transformations on it, then loading it into the BI tool. The same AI abilities mentioned before can be applied in this way to end up with usable data at the endpoint: removing duplicate records, filling blank values, and suggesting other cleansing and transformation actions, such as clustering and segmentation, based on the learning dataset. However your data is stored, the right AI analytics tool can help get it into better shape for when you create your single source of truth; it can also help as you load your data into your BI platform or data science tool.
While you’re moving your data into your BI system, the big chance for an AI assist is in monitoring the process. If a load fails, exceeds the normal time threshold or the forecasted one, the AI can learn that and ping the engineer to let them know there’s a problem. A sudden change in the volume of data being loaded could also be worth a mention, so that the engineer can look into it and see if there’s a larger problem.
The bottom line is that a strong AI analytics system can be a second set of eyes for a busy data engineering team, freeing them to focus on the challenges that drive more value to the analytics team, and ultimately the business.
Is AI taking engineering jobs?
Although humans losing jobs to robots is a nice story, in reality, it is far from the truth for data engineers. Tackling routine tasks like eliminating redundant data, filling in gaps in datasets, and pinging human engineers when anomalies arise are all places where AI analytics systems can really add value, doing the heavy lifting that humans don’t really want to do anyway, and augment hard-working data engineers to tackle the challenging problems that will lead to bigger rewards for the company down the line.
About the author: Inna Tokarev Sela leads Sisense’s AI Group focused on building the Data Science capabilities powering engines for Augmented Data Preparation, Augmented Modeling, and Augmented Analytics.
Sisense offers the only independent analytics platform for builders to simplify complex data, and build and embed analytic apps that deliver insights to everyone inside and outside their organisations. Sisense lets builders collaborate on a single platform, delivered in a hybrid, cloud-native environment with the industry’s lowest cost of ownership, to create true democratisation of data and analytics. More than 2,000 customers across the globe rely on Sisense, including industry leaders like Tinder, Philips, Nasdaq, and the Salvation Army.
Learn more at www.sisense.com.