In today’s organizations, CxOs are challenged to realize the value of artificial intelligence (AI), as well as to achieve a positive return on investments made in AI and an AI approach known as machine learning (ML). Business sponsors are struggling to reimagine their applications and services in an AI/ML world. At the same time, product development teams are finding themselves ill-equipped to consume and integrate the new models into existing applications, while data scientists, constrained by unsuitable infrastructures, are spending more time sorting out the data infrastructure than writing new models and delivering value back to their organizations.
As a result, no matter who you talk to in the enterprise about data science, AI and ML, you are likely to receive an overly optimistic response that masks some degree of frustration and disillusionment caused by missed expectations, high costs and contentious politics. It is nearly a perfect storm and is reflective of market maturity and enterprise readiness for AI/ML initiative. But there is hope, namely AI for IT operations (AIOps).
AIOps is the collection of practices and process applied to AI/ML focused on data scientist productivity and model consumption. Many aspects of AIOps are directly linked to IT operations, but some address the core development practices of data scientists. Akin to DevOps, AIOps is the intersection of application/product development and IT operations.
In this post, I will introduce a few DevOps patterns that can be adopted to help solve AIOps challenges. We will specifically focus on the data scientist experience mentioned above. While all four roles (CxO, business sponsor, developer, data scientist) and their experiences are important, I suggest that solving for the data scientist first and enabling the developer second will accelerate enterprise time-to-value.
Enabling the data scientist
When data scientists spend more time building models that are consumable by product development teams, then the business will be able to innovate and experiment more frequently. Innovation and experimentation lead eventually to market success, which in turn drives ROI at the CxO level.
To better understand what platform requirements are needed – other than shear horsepower – let’s take a closer look at the horse first. While we know that purchasing a cart-load of servers packaged with high-speed networking and some all-flash storage arrays would provide many of the technical building blocks needed to build an infrastructure able to support the data scientist, there are a few other critical elements to the solution.
Data science is science. It is organized around a hypothesis that is proven through a series of experiments. Experiments by definition are unproven and, as such, often fail. While this insight may not be ground-breaking, it has the greatest impact on platform requirements. Since data scientists need to run experiments – many of which fail – they need a repeatable and consistent way to create, configure and load an environment with the needed tools and data so they can focus on creating, testing and publishing models, rather than building, configuring and loading environments. In lean-speak, building models is value creation whereas building environments is muda (aka. waste).
I am not suggesting that building environment is wasteful. However, I am saying that having a highly specialized data scientist build an environment is wasteful. You didn’t hire a data scientist for their Linux skills; you hired them to build ML models. Environment building is in the platform domain.
Every large enterprise is currently building or modernizing their cloud platform and delivering more and more capabilities-as-a-service. I argue that data and the associated analytics tools and capabilities should be no different. When extending your platform to include data analytics, AI, ML and more, don’t get hung up on tools and technologies; rather, focus on solving the following chief use cases and then build a flexible framework that will allow data scientist choice regarding tooling.
Chief data scientist use cases
- Automated sandbox provisioning – Consistently and repeatably create a development environment (sandbox) and configure the required processors, accelerators and storage via ticketless automation so that experiments can run on-demand.
- Automated tool deployment and configuration – Configure and preload the sandbox with market-leading tools via ticketless automation so that experiments can run on-demand.
- Automated data population/load: Populate the sandbox with a copy of production data from the existing data lake, data warehouse or external source via a data marketplace so that experiments can run on-demand.
- Data version control – Manage experiments by creating traceable code, data sets, transformations and models so that an environment and data model can be recreated and tested on-demand.
- Publish models – Publish application programming interfaces (APIs) so that models are discoverable and consumable by product development teams.
Reviewing these use cases, every system operator and security or compliance guru out there is likely panicking because, if built with no constraints, they would quickly consume all available resources in the data center. The liability and risk would be untenable for an enterprise.
Defining the solution
In agile, we use acceptance criteria to “box in” the solution. For example, for automated provisioning of sandboxes, an operations team can define acceptable configuration ranges, resource quotas and performance thresholds to govern against sprawl and/or over-allocation. The acceptance criteria are added to the use case to further define the to-be solution.
Fortunately for AIOps initiatives, many patterns similar to the aforementioned use case have already been defined and validated through countless DevOps initiatives, technical forums, books and articles. When building and operating a data analytics platform, we can borrow from these practices. The most relevant patterns to data analytics as a service (DAaaS) are as follows:
- Automated provisioning and configuration of bare metal, virtual machines and containers – Hardware and software components are deployed via orchestrated and automated deployment pipelines. Each component is defined in human-readable code and data. Code is run by an automation platform that injects specific variables (data) into the process that provisions unique and policy compliant environments. Various tools will be used to deliver infrastructure as a service (IaaS) virtual machines (VMs) and services (like VMware, Ansible and Chocolatey), compared to containers as a service (CaaS) containers and services (like Kubernetes, ZooKeeper and Strimzi).
- Version control systems – Source code control tools have been used for decades in software development disciplines to organize, track and manage code for developers. These tools create a framework that enables collaboration between developers and teams, as well as a process for recreating a specific version of software. Although early, data version control tools, such as Git DVC or Pachyderm, can provide data scientists with many of same functional capabilities as developers. By capturing the code and data needed to create or recreate a model, data scientists can organize, manage and track their experiments, recreate those experiments, and more effectively collaborate with other data scientists.
- Service catalogues and gateways – APIs are the modern work product for application, engineering and data development teams. As the number of cloud-native, DevOps and data science initiatives continues to grow, so does the quantity of APIs available in the enterprise. API gateways are mature technologies to help enterprises maximize the value of their programing interface assets. These tools address the management, documentation, accessibility and governance of APIs. Coupled with service catalogues, enterprise can create an API marketplace enabling self-service, on-demand access to data, code, automation, test frameworks and more. These marketplaces facilitate developer consumption of AI/ML models and, when paired with platform automation, can accelerate data scientist model creation and time to market.
There are many other patterns than can be adopted from software engineering, IT operations and DevOps that can help accelerate this journey. That said, data science – AI in particular – offers some unique challenges that will need to be developed over time. For example, capabilities such as accelerator virtualization and pass-through capabilities are still nascent and, unlike other workloads in the data center, data has much stricter and comprehensive governance and legal requirements.
Despite the lack of maturity, enterprise can offer a much more robust experience to data science teams by implementing the patterns above and continuing to innovate and experiment in the data center. Today, public clouds are the leading provider of these services. However, with some guidance and support, internal IT shops can deliver feature parity with public cloud. More importantly, they can create a more seamless and integrated experience for their data science teams.
To learn more
To explore innovative solutions for AI and machine learning, visit Dell EMC Ready Solutions for AI ,
Dell Technologies Big Data & IoT Analytics Implementation Services, and Dell EMC ProConsult Advisory Services.