Enterprise data scientists are frustrated by the Sisyphean struggle to get the technology assets they require to build data models. But that’s hardly the only hurdle: Because these projects slow-cook in siloes, data science teams often duplicate efforts. It’s a maddening combination of requisitioning hell and redundancies.
No stranger to such challenges, defense contractor Lockheed Martin installed a software platform to make the development of machine learning (ML) and artificial intelligence (AI) models more efficient. The platform centralizes assets required to build data models, reducing the costs of the company’s ML and AI projects by $20 million a year, says Matt Seaman, Lockheed Martin’s chief data and analytics officer of enterprise operations.
The self-service capabilities are critical for the company’s approach to democratizing access to data, Seaman says. “We’re reducing the barriers to start and run new projects that will help us make better and faster decisions with data.”
Adoption of self-service technology is soaring, representing the next phase of a consumerization phenomenon that put mobile computers and applications into the hands of millions of workers more than a decade ago. But perhaps nowhere is the interest greater than in data science, in which the potential of advanced analytics that helps discover business insights has been constrained by the same clunky processes that have long held back companies from reaching their potential.
Clearing the provisioning hurdle
Lockheed Martin is neutralizing the problem with the help of Domino Data Lab, whose collaborative data science platform helps the company’s 300-plus data scientists both build data models more efficiently and lay a foundation for future data scientists coming into the company, says Seaman.
Before landing on Domino Data, Lockheed Martin’s data scientists spent an inordinate amount of time identifying computing resources they needed and requesting them from IT. These staffers waited for IT to build, install and configure the integrated development environment (IDE) and other programming tools on a server, which they logged into every time they needed to access their projects and resources. But many data scientists are working on multiple projects, often requiring multiple systems, servers and IDEs, creating a constant cycle of blocking and tackling infrastructure.
Data scientists who spent time procuring infrastructure or engaging in software engineering spent less time building the data models. Also, the work suffered as Lockheed Martin couldn’t identify the pain points of data scientists trying to do their work, let alone track project status, Seaman says.
“We didn’t have a lot of visibility into who were the players trying to drive innovation” — let alone who needed to be enabled, Seamans says. “It’s about taking data out of its silo and getting it into the hands of people in a more efficient way.”
The data modeling ‘domino’ falls
Domino Data consolidates these capabilities into a browser-based graphical user interface (GUI) where users can access development resources through a menu of templates for software, machine learning libraries and infrastructure. They can pick programming languages (Python, R, SAS, etc.) and on-demand compute resources (CPU, GPU or Spark clusters) to build their models. Staffers can opt to work with private cloud or public cloud systems, avoiding resource lock-in.
In keeping with the company’s DevSecOps strategy, programming packages and their dependencies are automatically distributed, while tracking and audit capabilities for code, data and tools provide guardrails to ensure visibility and compliance. Because data scientists can access the tools and infrastructure they need, 90% of engineers previously assigned to supporting these workflows now support other business projects, Seaman says.
Data science teams are building new ML models to simulate the design of new aeronautics products. Other models help gain greater visibility into capacity of productions lines in factory operations, including tracking the flow of materials from assembly through fabrication, and detecting defects and maintenance issues. Other project focus on deep learning models that mitigate supply chain risk.
Seaman says the software reduces the time to begin building data models and launching them into production from weeks to minutes, while yielding a tenfold increase in productivity from greater access to resources. Data science leaders gain greater visibility into projects, optimizing collaboration and knowledge sharing, while the IT teams can manage and govern infrastructure usage and costs. “The platform brings order to the chaos,” Seaman says.
Data science know-how still needed
The ease of use suggests Domino Data is a low-code solution for analytics consumed by business users, but this is far from the case. While many data science modeling tools enable staff with limited technical capability to click and move compute assets around with a cursor, Domino Data requires some coding knowledge — a detail about which the company remains unapologetic.
“Organizations that are doing meaningful data science work are going to rely on code-first data science to do that work,” says Joshua Poduska, Domino Data’s chief data scientist. Domino Data Lab shouldn’t be the first stop for so-called citizen data scientists looking to learn the ropes of the discipline.
To wit, Lockheed Martin’s Domino Data users are practicing ML and AI engineers, and Seaman expects adoption of the platform will grow.
Seaman acknowledges that Domino Data isn’t for everyone, noting that it’s critical to incorporate a data strategy that supports everything from low-level data modeling tools to more sophisticated algorithms and deep neural networks. Even so, he says, “there will always be a place for advanced innovation that comes from code-base solutions.”
Industry market watchers tend to agree. Global spending on AI technologies will grow from $50.1 billion in 2020 to more than $110 billion in 2024 as more enterprises look to cultivate business insights at scale, according to IDC research.