Humans losing jobs to robots has been the preoccupation of economists and sci-fi writers alike for almost 100 years. AI systems are the next perceived threat to human jobs, but which jobs? Sourcing the logic from numerous open-source packages or paid API services, connecting disparate datasets, and maintaining a pipeline are complex tasks that AIs are ill-suited to do at present.\u00a0\nAI and the data pipeline\nA well set up data pipeline is a thing of beauty, seamlessly connecting multiple datasets to a business intelligence tool to allow clients, internal teams, and other stakeholders to perform complex analysis and get the most out of their data.\nData engineers\u00a0thrive on interesting challenges: bringing terabytes of data from wherever it lives to where it can be analysed, transforming it using various libraries and services, and keeping the pipeline stable. However, the data preparation phase of the whole process poses its own issues. It can be a creative process, and it\u2019s certainly necessary, but saving and automating the repetitive usage of the logic every X amount of hours is a challenge. Today, the way to solve this challenge is by bringing in artificial intelligence and machine learning.\nAugmented analytics\u00a0is the next iteration of business intelligence, where AI elements are incorporated into every phase of the BI process. The powerful AI analytics systems emerging today have AI assisting users in a broad range of ways, but we\u2019ll stay focused on data prep for this article.\u00a0\nThree sections of the data preparation process where AI can help that we\u2019ll discuss are data cleaning and transformation, extracting and loading, and verifying the prepared data.\u00a0\nClean as you go\nThe saying \u201cdata is the new oil\u201d gets tossed around enough to have already become a cliche, but for purposes of our discussion it\u2019s an especially apt metaphor. Most companies are sitting on huge stores of data, but in its unprocessed form, it\u2019s not very useful. Even worse, analysing non-normalised data boils down to potentially harmful and misleading results. To continue with the oil metaphor, you need a stable and reliable pipeline to take your data from where it\u2019s stored to where it\u2019ll be processed so its true value can be harnessed.\nWhile you\u2019re moving that data, data engineers have the ability to digest it so it\u2019s closer to being in a usable state by the time it hits the BI system. BI platforms are already using AI to help with the data cleansing process in a variety of ways. Let\u2019s walk through how AI can assist you:\n\nAI assistance can\u00a0recommend a date model structure, including which columns to join, which to compound, and maybe even create dimension tables to facilitate the fact table joins.\nAI systems can\u00a0apply simple rulesets to help standardise the data\u00a0by doing things like making all text lowercase and removing blank spaces before and after values.\nIf you already have a perfectly formatted dataset to use as a learning dataset, AI assistance can even be trained on this to\u00a0recognise how the larger dataset should look, allowing it to take a holistic approach to cleansing, rather than you telling it specific tasks to do.\nAs AI assistance learns how you want your data to look, the system can even\u00a0scan all the columns and make recommendations\u00a0as to what to fix, implement active learning, or go ahead and fix errors on its own, such as removing redundant records (deduplication caused by misspelling, for example) or using context clues to fill in missing values.\n\nExtracting and loading\nThe rise of\u00a0cloud data warehouses\u00a0has changed the way companies treat their data. In the past, well-organised databases were needed to keep records in order. Today, data comes from a wide array of different sources and in a variety of different forms, from user-generated to sensory data. More and more frequently we even witness companies using third-party data to enrich their business logic (how the weather forecast will affect my sales?).\u00a0\nThis change coincided with an increase in the sophistication of AI data analytics systems, allowing them to deal with data in all its types, structured (numerical) and unstructured (text, image, video). Data storage on cloud warehouses like Redshift is so cheap and there can often be different roles responsible for data gathering and storage, so rather than worry about how everything is formatted, companies just pump everything into the warehouse, however it\u2019s formatted, and deal with it later.\nThis is another place where BI with AI has a chance to shine, extracting the data, performing transformations on it,\u00a0then\u00a0loading it into the BI tool. The same AI abilities mentioned before can be applied in this way to end up with usable data at the endpoint: removing duplicate records, filling blank values, and suggesting other cleansing and transformation actions, such as clustering and segmentation, based on the learning dataset. However your data is stored, the right AI analytics tool can help get it into better shape for when you create your single source of truth; it can also help as you load your data into your BI platform or data science tool.\nWhile you\u2019re moving your data into your BI system, the big chance for an AI assist is in monitoring the process. If a load fails, exceeds the normal time threshold or the forecasted one, the AI can learn that and ping the engineer to let them know there\u2019s a problem. A sudden change in the volume of data being loaded could also be worth a mention, so that the engineer can look into it and see if there\u2019s a larger problem.\u00a0\nThe bottom line is that a strong AI analytics system can be a second set of eyes for a busy data engineering team, freeing them to focus on the challenges that drive more value to the analytics team, and ultimately the business.\nIs AI taking engineering jobs?\nAlthough humans losing jobs to robots is a nice story, in reality, it is far from the truth for data engineers. Tackling routine tasks like eliminating redundant data, filling in gaps in datasets, and pinging human engineers when anomalies arise are all places where AI analytics systems can really add value, doing the heavy lifting that humans don\u2019t really want to do anyway, and augment hard-working data engineers to tackle the challenging problems that will lead to bigger rewards for the company down the line.\nAbout the author: Inna Tokarev Sela leads Sisense\u2019s AI Group focused on building the Data Science capabilities powering engines for Augmented Data Preparation, Augmented Modeling, and Augmented Analytics.\nAbout Sisense\nSisense offers the only independent analytics platform for builders to simplify complex data, and build and embed analytic apps that deliver insights to everyone inside and outside their organisations. Sisense lets builders collaborate on a single platform, delivered in a hybrid, cloud-native environment with the industry\u2019s lowest cost of ownership, to create true democratisation of data and analytics. More than 2,000 customers across the globe rely on Sisense, including industry leaders like Tinder, Philips, Nasdaq, and the Salvation Army.\nLearn more at www.sisense.com.