The Pull of Data Gravity

Feb 23, 2022
IT Leadership

Why mass data infrastructure strategy must be built with this powerful force in mind.

Credit: Seagate Technology

By John Morris

More than ever, today’s mass data sets are on the move.

According to the 2021 IDC Cloud Data Storage & Infrastructure Trends Survey, 47% of enterprises use a centralized cloud storage architecture. In two years, that number will fall to 22%. Conversely, 25% of respondents currently have a hybrid storage architecture (a combination of both centralized and edge locations). That number will rise to 47% in two years.

As a result, exponentially proliferating volumes of data require real-time processing at the edge and transport to the cloud to extract more value from it through computationally intensive tasks (such as training of large machine-learning models).

Mass data matters

As the data spreads from cloud to edge, organizations must contend with a shift in data gravity.

A recent report titled Future-Proofing Storage: Modernizing Infrastructure for Data Growth Across Hybrid, Edge, and Cloud Ecosystems tells us that, as storage associated with massive data sets continues to grow, so will its gravitational force on other elements within the IT universe.

Just as stars form from scattered clouds of dust that collapse over time from their own gravitational attraction, concentrations of data have gravitational impact too.

Data gravity is the power of data to attract applications, services, and other data. “Workloads with the largest volumes of stored data exhibit the largest mass within their ‘universe,’ attracting applications, services, and other infrastructure resources into their orbit,” according to the IDC report.

Generally speaking, data gravity is a consequence of the amount of data (mass) and that data’s level of activation. The greater the concentration, the greater the pull. A body of data with greater mass exerts a stronger pull on the infrastructure surrounding it.

What does all this mean for data leaders?

What worked for terabytes doesn’t work for petabytes. As enterprises aim to overcome the cost and complexity of storing and activating data at scale, they should seek better economics, less friction, and a simpler experience—a solution built for the distributed data-driven enterprise.

To alleviate disproportionate pulls of data gravity, specific attention should be given to the economics of data movement. Physical data shuttles and services can often prove to be a more cost-effective and faster solution for massive data sets. Bonus: the coveted faster time to insights.


Seagate Technology

Data-centric infrastructures

Organizations should also make sure that data is stored closer to applications that require lower latency. This can be accomplished by using cloud-native designs that containerize applications and execute close to users, as well as interact with, create, and store data close to the point of origin. Containerizing applications simplifies management and deployment.

Containerization also provides a clean separation of concerns. Developers can focus on their application logic and dependencies. IT operations teams can concentrate on deployment and management without bothering with application details such as software versions and configurations specific to the app. The benefits to businesses are agility and efficiency—and often better security and TCO improvements.

Data-centric architecture means accessibility. It increases ease of use and smooth operations of a data pipeline and can impact future business innovation—improving the ability to generate metadata and new data sets, enabling search and discovery of the data, and further empowering data scientists to deploy the resulting models for machine learning.

Accessibility can also positively affect application performance, reduce latency, curb, or eliminate egress charges, and make it easier to manage security and compliance.

The business benefits of gravity-mindful storage infrastructure are many. They include excellent customer experience, protection of data sets, policy-driven access, lowest costs for retention, preservation for analysis, and finally, management simplicity to ensure service resiliency.

Learn more about data gravity, hybrid architecture, overcoming network constraints, and the growing complexity of storage management in the recent Seagate-sponsored report from IDC, Future-Proofing Storage: Modernizing Infrastructure for Data Growth Across Hybrid, Edge, and Cloud Ecosystems.

John Morris is senior vice president and chief technology officer at Seagate Technology.