Akkio adds genAI to help users modify their data, make predictions through natural language


By combining generative artificial intelligence with predictive analytics, Akkio has created a powerful business intelligence tool for digital agencies to help deliver better results. By using GPT-4 technology to let users clean up their data sets, Akkio can uncover insights and generate charts and reports through typing in prompts using natural language. This lowers the barrier to data analysis, allowing any person with a basic understanding of data to get insights to drive decisions. Jon Reilly, founder and co-CEO of Akkio, demonstrates the product on this episode.

Register Now


00:00 00:00:00:00 - 00:00:19:12
Keith Shaw: By combining generative artificial intelligence with predictive analytics. Akkio is creating a powerful business intelligence tool for digital agencies to help deliver impactful and strategic results. The tool uses GPT four technology to let users clean up their data sets, uncover insights and generate charts and reports by simply typing in prompts with natural language.
00:00:19:14 - 00:00:37:16
Akkio says this lowers the barrier to data analysis, allowing any marketer with a basic understanding of data to get insights that can help drive decisions. Joining me to demo the tool is Jon Reilly, founder and co-CEO of Akkio. Hi, Jon. Hey, how's it going? Good. So let's, let's jump right into it. Who is it for and what problems are you're solving?
So we're a platform A.I., native generative business intelligence platform. And what that means is we make it really easy for anyone who has to leverage data to make some decisions, primarily in businesses to do so backed by AI and ML. And I'll show you that throughout the demo so they can make smarter, better decisions, so they can move from just a backwards looking view of data to getting predictive about future outcomes and so that they can leverage models that they build in real time decision making processes.
[This transcript was auto-generated.]
00:01:06:07 - 00:01:36:15
Okay. And what problem is it solving for those those people? Like what were they doing before? Yeah, so, so it's the it's sort of the long tail of business operational analytics today. People use Excel or maybe some visualization tools to try and get a sense of what's going on in the data to help drive decision making. But if you talk to most businesses, particularly small medium businesses where they don't have a lot of money to dump in the data science teams, you'll see that they don't do a great job of extracting value from the data because there's a big gap between their intent to get value from the data and the thing they're asking about
00:01:36:20 - 00:01:56:12
and actually being able to get a graph or an insight or some visualization of what's going on. What generative AI as a technology has enabled is for us to go from intent almost directly to result. You see this in things like Mid Journey or Dolly 3 where you describe an image and instead of having to actually create that image as an artist, you just get the image.
00:01:56:13 - 00:02:18:15
Yeah, same thing with text. With GPT, we're effectively that same concept but applied to data. Okay, so show me what you got in terms of the demo and go, you know, just talk through it. Great, let's do it. So today I'm going to show you, working through a sample problem on the platform in particular, You know, we do a broad range of business applications, but I'm going to show you a real estate example.
00:02:18:17 - 00:02:36:08
Let's let's pretend that I'm a I'm a property management company in the state of Washington. And what I'd like to do is purchase some properties and convert them into rentals, something like that. And I have a set of data that I'm showing you here in a Google sheet because it's reasonably easy to see that's been scraped from the MLS.
00:02:36:10 - 00:02:59:12
So this is 4500 rows of various houses that have sold in that Washington state area. The prices that they sold for and the features of those homes, their living area, their lot size, the it's around the waterfront, stuff like that. And so I'll use our tool to understand this data, maybe do some transformations on the data to build a model, to understand the relationships between these inputs and price.
00:02:59:14 - 00:03:13:04
And then when we're done with that, what I'd like to do is take this new set of homes that are on the market, that are fresh on the market. They haven't sold yet and predict what their sales price should be so that I can find the deals that I would want to act on in order to turn into rental purchase properties.
00:03:13:06 - 00:03:30:17
Yep. Cool. All right, so let's do this. In Akkio runs in a web app, so I'm just doing this entirely in Chrome. I'll start by creating a new project and it starts with connecting a live data source. One of the key pieces here is most businesses now have their data living in live systems, so everything we do here will be on live data.
00:03:30:21 - 00:03:50:22
That means as the data updates the charts we make, we'll update the models that we make, we'll update. And the predictions obviously are also live. You know, we have customers connecting 300 million road data sets through Snowflake or even small data sets through things like Google sheets. We use Google sheets for this example because it's a it's a reasonably understandable application.
00:03:51:00 - 00:04:08:18
I'll just click through to property prices and we'll load in that dataset from Google sheets that we just took a look at. Okay. Right off the bat, we're going to get started using generative AI to create a report that shows us key insights into the data for our application. So we ask you to describe the goal of evaluating this data.
00:04:08:22 - 00:04:37:03
Would you describe your intent effectively? Like I said earlier, so here it will be. I am a property management company looking to purchase rental property or something like that. Okay. And we just generate report and here you're going to see in the bottom, right, we're going to go ahead and start building a dashboard or a report on your dataset and that and that intent question without you doing anything else.
00:04:37:05 - 00:04:53:22
While that thing builds in the background, you can see that we've imported the table here. You know, we've sort of automatically detected the type of information in each one of the columns. That can be a number, it can be a category text. And this report's ready. I'll pop over and we'll take a look at it in a second.
00:04:54:00 - 00:05:19:02
But you can see that you can also see sort of quick aggregations of what's going on in the data. Here's the distribution of your built. You can see the intra column correlations in a simple click. So it makes sort of navigating what's going on in the data and understanding the general shape of it. Really, really quite simple. But, you know, often what happens when you start working with data is you need to filter or create transformations, and that's the first place we start leveraging large language models like GPT four.
00:05:19:04 - 00:05:40:06
You can do auto clean in single click. You can convert things like your gate field into an ISO standard date. Know if you've had to do a lot of work with dates but kind of pain in the butt. But you can also just do a natural language driven transformation. So we could do something like combine all location info and I'll spell that incorrectly and that's going to be okay because we're using a large file that's good.
00:05:40:11 - 00:06:00:06
It's going to pass. That combine all location info means that it's going to take the columns that have location info and it figured this out on its own. It's going to concatenate them and it's picking straight city states. Up in country is the location containing columns, and it will give me back a new column that has the full address inside of it.
00:06:00:06 - 00:06:33:13
So, so, so you can clean up the data with Jenny Jenny before you even get to the destination. So you can you can do filtering, you can do outlier removal. So for example, like let's let's change this up and and say remove all outliers from square foot lot. Okay? And remove outliers from a square foot lot. And so what'll happen here is the large language model will develop an interpretation of what an outlier is.
00:06:33:15 - 00:06:53:12
It will then write code to apply that transformation to the data table. And you can see here square foot lot has now been adjusted and it no longer has those long tail of outliers. Okay? It defines its own approach. So it decided to use an interquartile approach to outlier removal. You can actually get specific and you can say remove three sigma outliers if you wanted.
00:06:53:14 - 00:07:08:10
One of the things we've had to do in the user interface is we've had to tell you back how we interpreted your ask, because those intent stated in natural language are also not are often not as specific or as well stated as you might do in code. Right. So we take the code and we give it back to you.
00:07:08:10 - 00:07:30:00
But, but so okay, I think you get the idea here. It's really easy to do any transformation that you might have needed, needed to do in the past with some big Excel formula or even writing sequel. You can now do those just using natural language, huge acceleration in the time. You can also use those to create charts, you know, so you can just say make a scatterplot of house price versus square footage.
00:07:30:00 - 00:07:49:00
We make some suggestions for you here to write and we'll create that chart and now we've got like a scatterplot of house prices versus square footage. One of the things you can do. So this is a live data pipeline, like I mentioned at the beginning. So I'm going to save this over to the report tab. It turns out we already made that chart when we made the automatic report.
00:07:49:00 - 00:08:09:08
So yes, that's nice. But you can see here that like that automatic prompt we did about being a real estate company has created a few different interesting views of the data that are relevant to the task. So here's a distribution of prices based on the condition of the rental properties. Condition kind of matters because you can think about how much the rent would be and you can think about the impact on rental prices.
00:08:09:09 - 00:08:31:02
Here's the overall price range of properties in the dataset. We've got like bedroom configurations and a pie chart showing the different types of views available. So you get a variety of like potential tenant value propositions that might be in there. All of that was generated just on that basic, that first prompt without me doing anything. My property management company knows exactly to come up with the prediction of the house.
00:08:31:07 - 00:08:45:11
So we have pricing. Okay. So we're going to do that next. How we're going to do that next so you can create any one of like 20 different chart types automatically just with a chat. You can ask it to narrow down its focus. You can say which which rental properties would be good for a family of four, and it'll automatically pose that question.
00:08:45:11 - 00:09:05:20
The question it'll give you a table or a chart depending on what you ask for. But when we get to the task of doing that predictive outcome, we're going to use the AutoML engine that we built. So there we click on this predict tab, we do three types of ML models, classification regression and time series forecasting. In this case, since we're predicting a number, it's a regression problem.
00:09:05:22 - 00:09:23:05
Just click predict here and it's as simple as selecting your outcome of interest, which is price. You know, how much did that house cost and we'll just hit create predictive model. Now in the background we're going to do all the data science for you so you don't have to We'll split the data 8020. Okay. Well, code every one of the columns with the proper encoder.
00:09:23:05 - 00:09:43:21
So like a text encoder for text, you know, numbers and categories all treated appropriately. And then we're going to bootstrap a neural architecture search, which is a method of looking for the best performing ML model on the 80% of data that we're training on. When that completes, we'll run that model against the 20% of data that we held back to see how well it did.
00:09:43:21 - 00:09:58:04
Yeah, and that'll let us know like how good the ML engine was at finding the patterns in the data between things like size of the home and basement square footage and stuff like that, and the price of the homes, which is sort of the key output that we're interested in.
00:09:58:04 - 00:10:05:12
All right. It'll be finished in just a second here. When it finishes up the training, what we'll see is we're going to create a thing we call the Insights Report.
00:10:05:18 - 00:10:28:11
And the insights report is the patterns in the data are relevant to the outcome, so you can understand them really well. But it also tells you like how well it is at predicting future outcomes so that you can think about like the business value you might get when leveraging the model, right? Excuse me. So as it finishes looking at different model types here, it'll generate the insights report and then we'll take a look at how good it did at predicting those prices.
00:10:28:13 - 00:10:58:16
So wrapping up the training here we go. All right. So this is a reasonably small dataset. It was, I think, 4500 example homes. You can see that when we predict a price in the withheld 20%, we're usually within 15% of the price of the home, which is pretty good. As you drill in here, each one of these dots is a representation of the prediction versus the actual value.
00:10:58:16 - 00:11:19:11
So the closer it is to that dotted line, yeah, the better the prediction was. And you can see we got a pretty strong pattern. There's a pretty clear clustering of predictions along that ideal or actual outcome line. So so this model has some value, but if you were actually working in this application, you would probably want to train it with a lot more examples of homes and their sale prices.
00:11:19:13 - 00:11:34:10
But you can even drill down into here and see we picked a multilayer neural network as the winner and we looked at some other different types of models too. And if you do a longer training mode, which I won't do for the purposes of demo, we'll look at like 15 different model types and pick the best performing. Okay.
00:11:34:12 - 00:11:53:11
But the real payoff here is understanding the relationship between those inputs and the price. And so as you can see, it really pretty much boils down to location and the size of the home. That's not such a surprise. Yeah, that seems like common sense. Yeah. So we can see that like the cities have a big impact on price.
00:11:53:11 - 00:12:14:01
If you're in Bellevue, it's on average 120 more expensive. If you're in Kent, it's $62,000 cheaper. And then the living square footage is another big one. Not bigger homes, more expensive, smaller homes, less expensive. We even do sort of text extraction. So remember, the address field was it was in text if it contained the word Mercer, that probably means it's on Mercer Island.
00:12:14:01 - 00:12:38:10
So that makes it more expensive if it has the words place or southwest less expensive. We'll show you factor by factor breakdowns and we'll even do automatic segmentation. So you can say here's a group of like high end homes. They have some similarities, like they're more likely to be inclined. Hill They're more likely to have a view of four here, some low priced homes, third, three times more likely to have a single bedroom or be in worse condition.
00:12:38:10 - 00:12:52:11
So those types of things also help you think through like how to segment a population, which is one of the big values you get from using machine learning with data. And then we'll show you a sample prediction so you can see how we did. But the final step here would be getting this all the way across the finish line.
00:12:52:11 - 00:13:16:10
Yep. And we will predict those outcomes in the new tab. So I will pick the dataset, new homes on the market here and we will auto map the inputs and outputs from our training set, click, show preview. And then when we're done with that, I'll deploy it and make the predictions and we'll pop over to that other tab and see that we've filled in the predictions for the outcome, and that'll be pretty much the end of the demo.
00:13:16:10 - 00:13:19:04
So let's click to deploy.
00:13:19:04 - 00:13:36:13
We will run it now and if we're quick enough to browse over here, we should see who's going to see the actual predictions. Fill in over here on the right and any time now we'll see there's populated. And one thing to keep in mind is we're pretty cautious about overwriting your data. We try to never do that.
00:13:36:13 - 00:13:54:08
You can see how filled in. So we'll always make a new column. Or if you're using something like Snowflake, we'll actually make a new table to put the predictions in because we never want to mess with you. Sure, sure. All right. So where can people go for more information on on this? I'm sure you've got a lot of other things you can check directly to Akkio.com to check it out.
00:13:54:10 - 00:14:08:20
It's an open platform. Anyone can sign up immediately and get a trial. You know, we give we make all these features immediately available to you so you can try it with your data to see how well it works. And we're always happy to have you tried and give us any feedback. All right Jon thanks for the demo. Thanks for having me.