Data architecture definition
Data architecture describes the structure of an organization’s logical and physical data assets and data management resources, according to The Open Group Architecture Framework (TOGAF). It is an offshoot of enterprise architecture that comprises the models, policies, rules, and standards that govern the collection, storage, arrangement, integration, and use of data in organizations. An organization’s data architecture is the purview of data architects.
Data architecture goals
The goal of data architecture is to translate business needs into data and system requirements and to manage data and its flow through the enterprise. Many organizations today are looking to modernize their data architecture as a foundation to fully leverage AI and enable digital transformation. Consulting firm McKinsey Digital notes that many organizations fall short of their digital and AI transformation goals due to process complexity rather than technical complexity.
Data architecture principles
According to Joshua Klahr, vice president of product management, core products, at Splunk, and former vice president of product management at AtScale, six principles form the foundation of modern data architecture:
- Data is a shared asset. A modern data architecture needs to eliminate departmental data silos and give all stakeholders a complete view of the company.
- Users require adequate access to data. Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs.
- Security is essential. Modern data architectures must be designed for security and they must support data policies and access controls directly on the raw data.
- Common vocabularies ensure common understanding. Shared data assets, such as product catalogs, fiscal calendar dimensions, and KPI definitions, require a common vocabulary to help avoid disputes during analysis.
- Data should be curated. Invest in core functions that perform data curation (modeling important relationships, cleansing raw data, and curating key dimensions and measures).
- Data flows should be optimized for agility. Reduce the number of times data must be moved to reduce cost, increase data freshness, and optimize enterprise agility.
Data architecture components
A modern data architecture consists of the following components, according to IT consulting firm BMC:
- Data pipelines. A data pipeline is the process in which data is collected, moved, and refined. It includes data collection, refinement, storage, analysis, and delivery.
- Cloud storage. Not all data architectures leverage cloud storage, but many modern data architectures use public, private, or hybrid clouds to provide agility.
- Cloud computing. In addition to using cloud for storage, many modern data architectures make use of cloud computing to analyze and manage data.
- Modern data architectures use APIs to make it easy to expose and share data.
- AI and ML models. AI and ML are used to automate systems for tasks such as data collection, labeling, etc. At the same time, modern data architectures can help organizations unlock the ability to leverage AI and ML at scale.
- Data streaming. Data streaming is flowing data continuously from a source to a destination for processing and analysis in real-time or near real-time.
- Container orchestration. A container orchestration system such as open-source Kubernetes is often used to automate software deployment, scaling, and management.
- Real-time analytics. The goal of many modern data architectures is to deliver real-time analytics, the ability to perform analytics on new data as it arrives in the environment.
Data architecture vs. data modeling
According to Data Management Book of Knowledge (DMBOK 2), data architecture defines the blueprint for managing data assets by aligning with organizational strategy to establish strategic data requirements and designs to meet those requirements. On the other hand, DMBOK 2 defines data modeling as, “the process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model.”
While both data architecture and data modeling seek to bridge the gap between business goals and technology, data architecture is about the macro view that seeks to understand and support the relationships between an organization’s functions, technology, and data types. Data modeling takes a more focused view of specific systems or business cases.
Data architecture frameworks
There are several enterprise architecture frameworks that commonly serve as the foundation for building an organization’s data architecture framework.
- DAMA-DMBOK 2. DAMA International’s Data Management Body of Knowledge is a framework specifically for data management. It provides standard definitions for data management functions, deliverables, roles, and other terminology, and presents guiding principles for data management.
- Zachman Framework for Enterprise Architecture. The Zachman Framework is an enterprise ontology created by John Zachman at IBM in the 1980s. The “data” column of the Zachman Framework comprises multiple layers, including architectural standards important to the business, a semantic model or conceptual/enterprise data model, an enterprise/logical data model, a physical data model, and actual databases.
- The Open Group Architecture Framework (TOGAF). TOGAF is an enterprise architecture methodology that offers a high-level framework for enterprise software development. Phase C of TOGAF covers developing a data architecture and building a data architecture roadmap.
Modern data architecture best practices
Modern data architectures must be designed to take advantage of emerging technologies such as artificial intelligence (AI), automation, internet of things (IoT), and blockchain. Dan Sutherland, senior director, technology consulting, Protiviti, says modern data architectures should adhere to the following best practices:
- Cloud-native. Modern data architectures should be designed to support elastic scaling, high availability, end-to-end security for data in motion and data at rest, and cost and performance scalability.
- Scalable data pipelines. To take advantage of emerging technologies, data architectures should support real-time data streaming and micro-batch data bursts.
- Seamless data integration. Data architectures should integrate with legacy applications using standard API interfaces. They should also be optimized for sharing data across systems, geographies, and organizations.
- Real-time data enablement. Modern data architectures should support the ability to deploy automated and active data validation, classification, management, and governance.
- Decoupled and extensible. Modern data architectures should be designed to be loosely coupled, enabling services to perform minimal tasks independent of other services.
Data architecture roles
Here are some of the most popular job titles related to data architecture and the average salary for each position, according to data from PayScale: