Future-proof your data management strategy by addressing these key features Credit: iStock Data Management is a multi-billion-dollar industry with heavy competition and an often-confusing landscape. Although an expansion of the industry has given way to a period of contraction and consolidation, the ecosystem is ever-evolving and it still continues to shift rapidly. Mergers, acquisitions, and displacements all impact the tools and platforms used to manage information. New hardware and software tools can quickly upend how data is being managed. For researchers, the process of collecting information to formulate a hypothesis, conduct experiments, or analyze and iterate on a research program can be a dauting task. The challenge compounds when the use of advanced technologies and big data is included, and it only gets more difficult with increased pressure from regulations and security constraints. To address these challenges, research-focused organizations need to take a strategic approach to data management. But what are best practices for data management in an ever-evolving landscape of approaches, tools, and threats? Data management can be broken down into a set of interconnected component parts. Taken as a whole, these components provide a structure to help various stakeholders – data engineers, data scientists, IT operations personnel, data users – understand how the evolution of data management is impacting the way that research is constructed and conducted, the skillsets necessary for users of the data, and what may be on the horizon for the data management ecosystem. We’ve identified nine key pieces of this puzzle: Data movement Data locality Metadata management Data integration Search capabilities Data catalog(s) Data pipeline(s) Policy and governance Intrinsic security and trust Organizations must carefully consider how they address these various components as part of their data management strategy to enable the research enterprise effectively, to generate efficiencies, and to protect all data as valuable assets. Read on for an overview of select components, or check out the full whitepaper, “Data Management for Research” by Adam Robyak and Dr. Jeffrey Lancaster. Data movement. A few trends are likely to impact how data movement will evolve over the coming years. First, organizations are adopting hybrid cloud environments in which data is stored in both on-premise infrastructure as well as with cloud providers, on remote devices, in sensors, and at edge gateways on top of on-prem and cloud services. As researchers seek to use that data, it will need to be both accessible and secure, no matter where it is stored. Second, machine learning is increasingly being used to automate manual tasks that had previously been the responsibility of IT professionals. As a result, those IT professionals can expect to spend less time on rote processes and more time monitoring resource allocation and troubleshooting at a distance. Data locality. Whether that data is generated and stored in the cloud, in a data center, on the edge, or somewhere in between, understanding where data lives is critical to any data management strategy. Additionally, edge computing is one newer consideration that has emerged in response to decentralized IT, Web 3.0, and disaggregated data where the computational advantage comes in pre-processing data so only key data, aggregate data, or pre-analyzed data is transmitted from the edge back to a data center. And in some cases, data doesn’t need to make a round-trip to a data center; it can be wholly processed at the edge. Edge computing can be employed for a range of applications, from AI and analytics to inference and localized learning. Edge systems can also provide data aggregation from multiple endpoints and they can act as relays or nodes in a distributed network. Data Pipeline(s). Data pipelines provide an organized and often-efficient construct for delivery of information from data source to destination. Pipelines should be automated whenever possible and can leverage machine learning and artificial intelligence to aid in sourcing as well as ingest. To make the best use of data pipelines, researchers should be able to clearly articulate where, when, and how data is collected. Multiple data pipelines are likely to be employed by researchers and organizations who have a mature data management strategy. Policy and governance. Policy and governance have also led to the expectation that researchers must have a plan for data management. The National Science Foundation and the National Institutes of Health, along with other Federal agencies in the United States, mandate the inclusion of a data management plan as part of grant applications. Universities and colleges thereby assume the responsibility for the proper stewardship of the data that is generated by the research enterprise. The burden on institutions continues to grow as the amount of research data for which they are responsible exponentiates. Intrinsic security and trust. The trust gaps associated with current solutions present an opportunity for new and emerging technologies: the Internet of Things is being secured through a mix of edge and telemetry data collection and processing; data provenance solutions are ensuring the accuracy and legitimacy of data, even for physical items procured through complex supply chains; data security across hybrid cloud models is protecting data in transit. Even SecDevOps – the process of integrating Security, development, and IT Operations into a contiguous and cohesive lifecycle management architecture – is a sign of the attention and importance afforded to the need for trust within data management. By deconstructing the components of a data management strategy, researchers can ensure that they are both responsible stewards of the data and that they are employing best-in-class emerging technologies. Although the responsibility does not wholly fall on researchers — it must be shared by research administrators, students, and others — it is only through the collaborative cooperation of researchers, organizations, and IT operations that the optimal implementation of a data management strategy can be achieved for research. For a more in-depth look at each of the components of a successful data management strategy, see the Dell Technologies whitepaper “Data Management for Research” by Adam Robyak and Dr. Jeffrey Lancaster. Related content brandpost The steep cost of a poor data management strategy Without a data management strategy, organizations stall digital progress, often putting their business trajectory at risk. Here’s how to move forward. By Jay Limbasiya, Global AI, Analytics, & Data Management Business Development, Unstructured Data Solutions, Dell Technologies Jun 09, 2023 6 mins Data Management brandpost Democratizing HPC with multicloud to accelerate engineering innovations Cloud for HPC is facilitating broader access to high performance computing and accelerating innovations and opportunities for all types of organizations. By Tanya O'Hara Jun 01, 2023 6 mins Multi Cloud brandpost Solving 3 key IT challenges to unlock business innovation Dell and Microsoft are integrating strengths to help organizations unlock innovation with cloud-like agility across on-premises, edge, and cloud environments. By Vikram Belapurkar, Product Marketing, Multicloud, and Software-defined Infrastructure Platforms, Dell Technologies May 23, 2023 4 mins Hybrid Cloud brandpost How to Make the Quantum (Computing) Leap Three steps to start deploying quantum computing applications. By Mike Robillard, Senior Distinguished Engineer, Office of the CTO, Dell Technologies and Victor Fong, Distinguished Engineer, Office of the CTO, Dell Technologies May 08, 2023 7 mins Digital Transformation Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe