Is Artificial Intelligence Just Another High Performance Computing Workload?

BrandPost By Roland Kunz Ph.D.
Aug 11, 2020
AnalyticsBig DataHadoop

data scientists
Credit: Dell Technologies

Artificial intelligence (AI) comes to our minds whenever organizations are seeking new, innovative approaches to enhance and expand their businesses. Typical attributes for such workloads are “use case centric”, “data scientist specific”, and “innovative”. Some examples include autonomous farming, self-driving cars, or interactive voice dialog systems.

On the other hand, high performance computing (HPC) is often seen as highly specialized and expansive, serving a wide range of custom-written applications by research engineers, or some of the major industry HPC software stacks like fluid dynamics modelling or crash simulations.

For many years now, both disciplines have been treated separately, developing their own ecosystem of specialized hardware, software stacks, and operational models. But if you take a look from the outside, neither workload is far away from the other regarding their basic requirements. Luckily, two recent developments in technology have made it possible for organizations to standardize on a common production environment:

  1. Common AI frameworks like TensorFlow have become more mature and established
  2. Server Infrastructure and Networking technology advantages deliver greater performance

Given those two developments, a new and broader definition of HPC is possible which includes the following four areas:

  1. Traditional HPC – like weather forecasting and oil exploration
  2. Data Centric HPC – like financial modelling or genomics
  3. High Performance Data Analytics – like fraud detection or personalized medicine
  4. Artificial Intelligence – including deep and machine learning

Now that more organizations are running ─ or is planning to run ─ AI initiatives, almost every organization is now a home for HPC. However, various studies, including one from 451 Research, have revealed several issues that interfere with the success of those initiatives. I want to point out two of the most relevant ones.

First, the lack of expertise and getting enough skilled resources is a challenge for nearly every organization. Thus, having skilled workers is crucial to working as efficiently as possible. Secondly, limited budgets within IT organizations restricts the number of choices available for solving a given problem.

While there is always a best-of-breed solution available from a technology point of view, the cost structure, complexity and reusability of such a solution is far from optimal from a price/performance perspective, so a “good-enough” model approach yields to a much more flexible, reusable solution – typically with much lower CAPEX and OPEX costs.

This all highlights the need to have a standardized HPC environment that ideally fits directly into the existing IT infrastructure.

To help organizations with this, Dell Technologies has created a set of easy to consume, workload orientated architectures as Ready Solutions or Ready Architectures. The advantage of such an approach is that it gives flexibility of choices but builds on standards.

The recipe is not too complicated:

  • Standard x86 Servers with latest Intel CPU, and optional accelerator cards (only if required)
  • High-speed Ethernet Networking (25/100 Gbit) (no need for Fibre Channel or Infiniband in most cases)
  • A centralized data lake to serve data to the compute nodes. (i.e. PowerScale – additional parallel Filesystems are only required for certain use cases)
  • Automation software that delivers docker and Kubernetes, workload scheduling, and can provide a supported stack of AI tools and frameworks. (i.e. Bright Cluster Manager)
  • Services and Support to support the whole system from a single point of view.
  • A data scientist centric working environment (i.e. Jupyter notebooks)

With such a system, most of the HPC workloads described above can be run in a very efficient way in regards to IT operations, cost structure and user acceptance.

In conclusion: Only specialize in IT infrastructure when you really need it. Most of the demands can be satisfied with well-designed standard hardware components, an optimized software stack and automation capabilities. So, from this angle, AI is indeed just another HPC workload.