Why machine learning needs to get closer to your data

BrandPost By Keith Shaw
Jun 11, 2021
Technology Industry

istock 1295900106
Credit: iStock

For machine learning to be used more widely, it needs to be brought closer to the data that fuels ML modeling and insights. For database developers and data analysts that are still getting up to speed on ML modeling, the ideal scenario is to integrate ML algorithms and training models directly into the tools they already use, making it easier for them to extract meaningful insights from their data.

In-house database teams are likely to be experts in SQL, but they may not know Python, which has emerged as a primary programming language for AI and machine learning.  As a result, they’re reliant on data scientists to build the models for them to add intelligence to their applications.

Even when they have these models in hand, there’s a long and involved process to move data from the right sources to the ML models.

“There’s a little bit of hill climbing to go from SQL to Python and machine learning models,” says Bratin Saha, Vice President and General Manager of Machine Learning Services at Amazon. “We want to meet customers where they are and make machine learning more accessible to all our customers.”

Shortening the learning curve and reducing the complexity of ML projects can unlock new insights that drive business innovation. For example, Expedia Group uses Amazon QuickSight ML Insights, an ML-powered business intelligence service that helps uncover hidden trends and outliers, to measure, report, and act on business metrics to help customers find the best matches for their travel searches. “Amazon QuickSight’s out-of-the-box machine learning insights help us to continuously monitor our business for anomalies, alert stakeholders when outliers occur, and help our business project future trends,” says Amit Marwah, Director of Technology, Flights Data & Analytics at Expedia Group.

In addition to QuickSight ML Insights, Amazon has integrated machine learning capabilities into many other data-related services. For example:

  • Amazon Redshift ML allows analysts to use SQL queries to make predictions from a data warehouse. By integrating Amazon SageMaker into Redshift, data teams can create a model using SQL and have SageMaker Autopilot apply the best algorithm for the data. Interactions between the data and ML are abstracted away, then made available as a SQL function to use in data queries, reports, and dashboards.
  • Amazon Neptune ML integrates ML into the Neptune managed graph database service, allowing developers to apply ML to applications that use graph data to build things like recommendation engines and generate more accurate predictions for fraud detection. Neptune ML selects the graph data needed for training, automatically chooses the best ML model for the selected data, exposes ML capabilities via simple graph queries, and provides templates for customizing ML models for advanced scenarios. With ML algorithms purpose-built for graph data via SageMaker and the Deep Graph Library, developers can improve prediction accuracy by more than 50%, compared to that of traditional ML techniques.
  • Amazon Aurora ML makes it easier to integrate ML into Aurora-powered apps by applying ML right from the database using a SQL query. Behind the scenes, Aurora sends the data to Amazon Comprehend or Amazon SageMaker, with ready-to-use results returned to Aurora, giving relational database developers a way to apply ML to their data.
  • Amazon Athena ML gives developers access to more than a dozen built-in ML models, or it can be used with their own models in Amazon SageMaker directly from ad-hoc queries in Athena. As a result, analysts can easily run ad-hoc queries in Athena that use ML to forecast sales, detect suspicious logins, or sort users into customer cohorts.

Integrating ML into these data services provides the dual benefit of bringing machine learning closer to the data, which can save companies time and money in the long run, along with making ML more accessible for rapid deployment across multiple use cases. Organizations can accelerate their pace of innovation by turning data into insights.

Learn more about ways to reinvent your business with data.