BrandPosts are written and edited by members of our sponsor community. BrandPosts create an opportunity for an individual sponsor to provide insight and commentary from their point-of-view directly to our audience. The editorial team does not participate in the writing or editing of BrandPosts.
By Ellen Friedman
Unlike Las Vegas, what happens at the edge doesn’t stay at the edge. And that’s half the challenge.
People commonly think of edge computing as a glorified form of data acquisition or a local digital process control. In reality, edge is a lot more than both of those.
It’s true that edge involves many data sources, usually at geo-distributed locations. But keep in mind, it’s the aggregate of that data that holds the key to value and insights. Analysis of the combined data is carried out at core data centers, and actions guided by the resulting insights often need to be carried out at edge locations. Therefore, a surprising challenge of edge systems is efficient traffic not only from edge to core but also back again.
Scale is an edge issue as well. Incoming data at edge sources is often huge, and putting that together from a large number of edge locations can create truly enormous amounts of data.
A classic example of this is found in the automotive manufacturing industry with its autonomous car development. Car manufacturers need ready access to global data, working with many petabytes of data per day. They must also meet critical key performance indicators (KPIs), including measuring how long it takes to collect data from test cars, how long to process, and how long to deliver insights.
Of course, not all edge systems involve this extreme scale of data, but most edge situations do involve too much data to transfer it all from edge to core verbatim. This means the data must be processed for data reduction at the edge before sending it to the core. All this – data analysis, modeling, and data movement – need to be efficiently coordinated at scale.
To better understand the challenges of edge systems and how they can be addressed, let’s dig into what happens at the edge, at the core, and in between.
What happens at the edge
Edge computing generally involves systems in multiple locations, each doing data ingestion, temporary data storage, and running multiple applications for data reduction prior to transport to core data centers. These tasks are shown in the left half of Figure 1.
Analytics applications are used for pre-processing and data reduction. AI and machine learning models are also employed for data reduction, such as making decisions about what data is important and should be conveyed to core data centers. In addition, models allow intelligent action to take place at the edge. Another typical edge requirement is to inventory what steps have taken place and what data files have been created.
All this must happen at many locations, none of which will have a lot of on-site administration, so edge hardware and software must be reliable and remotely managed. With these needs, self-healing software offers a huge advantage.
What happens at the core
The activities that happen at the core, seen on the right side of Figure 1, resemble edge processes but with a global perspective, using collective data from many edge locations. Here analytics can be more in-depth. This is where deep historical data is used to train AI models. As at edge locations, the core contains an inventory of actions taken and data created. The core is also where the connection is made to high-level business objectives that ultimately underly the goal of edge systems.
Data infrastructure at the core is especially challenging since data from all of the edge systems converges there. Data from the edge (or data resulting from processing and modeling at the core) can be massive or can consist of a huge number of files. The infrastructure must be robust in handling large scale both in terms of number of objects as well as quantity of data.
Of course, analysis and model development workflows are iterative. As an organization learns from the global aggregate of edge data, new AI models are produced and updated and analytics applications are developed that must be deployed at the edge. That brings us to the next topic, what needs to happen between edge and core.
Traffic between edge and core
Just as Figure 1 lists the key activities at the edge or in the core, it also shows the key interaction between the two: the movement of data or code. Obviously, the system needs to move ingested and reduced data from edge locations to the core for final analysis. But people sometimes overlook an unexpected journey: moving new AI and machine learning models or updated analytics programs that have been developed by teams at the core back to the edge.
In addition, analysts, developers, and especially data scientists sometimes need to inspect raw data at one or more edge locations. Having direct access from the core to raw data at edge locations is very helpful.
Almost all large-scale data motion should be done using the data infrastructure, but it can be useful to have direct access to services running at the edge or in the core. Secure service meshes are useful for this, particularly if they use modern zero-trust, workload authentication methods such as the SPIFFE protocol.
Now that we’ve identified what happens at edge, core and in between, let’s look at what data infrastructure needs to do to make this possible.
HPE Ezmeral Data Fabric: from edge to core and back
HPE is known for its excellent hardware, including the Edgeline series (specifically designed for use at the edge). Yet, HPE also makes the hardware-agnostic HPE Ezmeral Data Fabric software, designed to stretch from edge to core, whether on-premises or in the cloud.
HPE Ezmeral Data Fabric lets you simplify system architectures and optimize resource usage and performance. Figure 2 shows how the capabilities of the data fabric are used to meet the challenges of edge computing.
Computation can be managed at edge or core using Kubernetes to manage containerized applications. HPE Ezmeral Data Fabric provides the data layer for such applications. And thanks to a global namespace for HPE Ezmeral Data Fabric, teams working at the data center can remotely access data that is still at the edge.
Ellen Friedman is a principal technologist at HPE focused on large-scale data analytics and machine learning. Ellen worked at MapR Technologies for seven years prior to her current role at HPE, where she was a committer for the Apache Drill and Apache Mahout open source projects. She is a co-author of multiple books published by O’Reilly Media, including AI & Analytics in Production, Machine Learning Logistics, and the Practical Machine Learning series.