Legacy data infrastructures can no longer support the massive amounts of data that companies now create and collect. A modern data strategy, created from data lakes, purpose-built data stores, machine learning, and other cloud-based services, removes the restrictions of the traditional one-size-fits-all approach.
But this is not about simply connecting a data lake with a data warehouse. Rather, a modern data strategy is about creating a tightly integrated ecosystem that gives companies seamless data access and enables safe and secure governance over that data.
Attempting to scale an older data infrastructure often results in the first compromise for one-size-fits-all approaches. Legacy systems are typically expensive to build and maintain, and difficult to scale as more data is generated.
“Given the explosion in data volumes and the types of data that customers are dealing with, they quickly reach a point where they need a different technology,” says Rahul Pathak, Vice President for Analytics at AWS. “With a general-purpose database, you’ll generally end up in a dead end at some point because you’re trying to do everything reasonably well, which means you can’t be excellent at any one particular thing.”
A more practical approach is mixing and matching the right tools for the right jobs. “The type of database that you use to power something like an Amazon.com shopping cart has very different characteristics, in terms of performance and scale and cost, from something like an Oracle database,” says Pathak. “A more focused approach lets you build something where there’s no compromise on performance, functionality, scale, or cost.”
A modern data strategy also supports the data- and performance-intensive needs of technologies such as machine learning, which has become critical for data-driven business. “When customers move from basic reporting to machine learning, typically they are going to use a different system, but they want to use the same data,” says Pathak. Services such as Amazon SageMaker enable organizations to add machine learning capabilities to existing data stores to quickly build, train, and deploy models that adapt as the data adapts.
None of these benefits are possible without integration. In any enterprise, multiple systems are used to process, store, and manage data. “Some systems may be accepting data in real time, another might be recording transactions, and another might be generating reports and dashboards,” says Pathak. “You need data to be interconnected across these systems so different parts of the company can access the same data in different ways. That’s why we really believe in the approach of the right tool for the right job, making sure they are well integrated.”
Integration often involves both relational data warehouses and data lakes. Data warehouses let organizations quickly run complex queries on relational data, while data lakes can store and analyze vast amounts of data from various data silos into a single location. A modern data strategy enables organizations to have data warehouses and other purpose-built data services around the data lake, with unified data access that lets people access data wherever it lives, in a secure and governed way.
“It’s completely modular, by design, so you can start with whatever your priority is,” says Pathak. “This approach will help you evolve as your business evolves.”
Learn more about ways to reinvent your business with data.