As companies look to artificial intelligence to drive their digital transformation, software development will change dramatically as well.
Companies are prepared for the fact that developers will have to get up to speed on machine learning algorithms and neural networks, and are looking forward to seeing how AI will automate many development and testing functions.
But what many enterprises are missing is that the nature of software itself is changing.
Today, applications are deterministic. They are built around loops and decision trees. If an application fails to work correctly, developers analyze the code and use debugging tools to track the flow of logic, then rewrite code in order to fix those bugs.
That’s not how applications are developed when the systems are powered by AI and machine learning. Yes, some companies do sometimes write new code for the algorithms themselves, but most of the work is done elsewhere, as they pick standard algorithms from open source libraries or choose from the options available in their AI platforms.
These algorithms are then transformed into working systems by selecting the right training sets and telling the algorithms which data points — or features — are the most important and how much they should be weighed.
This shift toward data as the heart of developing software systems is causing leading-edge companies to rethink not only how they develop software but the kinds of tools and processes they will need to successfully navigate this paradigm shift.
Introducing ‘Software 2.0’
At the Spark+AI Summit last year, Tesla AI director Andrej Karpathy talked about how the self-driving car company is transitioning to this new way of developing code, which he called Software 2.0.
AI-powered optimization algorithms, such as neural networks, are pointed at a problem, try various solutions against evaluation criteria until they find the best possible solution. So, for example, the system could look through millions of labeled images to learn to distinguish between cars and pedestrians.
“We’re designing less, and things work better,” he said.
But what happens when this approach doesn’t work? For example, when Tesla’s self-driving cars had trouble figuring out whether to turn on windshield wipers while driving through tunnels, the solution wasn’t to dive into the machine learning algorithms to find out where they fell short.
Instead, the company discovered that its training data didn’t have enough examples of cars driving through tunnels. The solution was to pull out more images from cars taken in tunnels and send them to humans to classify.
“As a PhD student, I spent a lot of time on the models and algorithms and how you actually train these networks,” said Karpathy. “But at Tesla, I spend most of my time massaging the data sets.”
But managing training data isn’t as simple as having humans look at a set of images and label them. First, developers need a deep understanding of the data itself. For example, a system looking at static images of cars changing lanes will have a hard time figuring out that a car’s turn signal is blinking. Solving that problem requires going back to the training images and labeling them differently.
But changing the way images are labeled now means that a lot of previously categorized images will now have to be relabeled.
Moreover, humans can make mistakes when labeling images, or disagree with one another, or the images themselves may be problematic. That means there must be a process of escalating issues and tracking them.
When Tesla started this work, the processes and tools for managing this new approach to creating software didn’t exist.
“In software 1.0, we have IDEs to help us write code,” said Karpathy. “But now instead of writing the code explicitly we’re accumulating and massaging the data sets, and they’re effectively the code. What are the IDEs for data sets?”
From code to data
Alex Spinelli, who headed up Alexa for Amazon before becoming CTO at LivePerson last year, has seen this transformation of the development process firsthand.
“Before, there were decision trees, paths, case statements,” he says, adding that now developers must know there’s enough data, with the right examples, to ensure an algorithm has the fuel it needs to keep working. “We are actually creating some novel algorithms for the industries we support.”
For more than 20 years, LivePerson has been helping companies such as Home Depot, Adobe, HSBC, and L’Oreal communicate with their customers. In 2016, it embarked on a paradigm shift by moving into AI-powered chatbots.
To develop its chatbots, the company begins with human-labeled examples of, say, customer questions. “I have 100,000 versions of ways people have said, ‘I want to pay my bill,'” he says. “That’s the beginning.”
Once there’s enough data, the next challenge is to figure out which attributes are important, he says. An automated system can pull out correlations but may not be able to determine causality, for example. Just because clocks often ring around sunrise doesn’t mean the alarms cause the sun to come up.
“Decisions are made in how to weight certain attributes or features of data,” he says. “You need experts who spend a lot of time thinking about these problems.”
Today, depending on the customer, LivePerson can understand between 65 to 90 percent of customer questions, and the company is continually trying to improve this percentage by using AI technologies like unsupervised learning and transfer learning, as well as human input.
Bias is the new bug
When AI-powered systems don’t work, there are three main approaches to solving the problem.
First, the problem could be in the algorithm itself. But that doesn’t mean developers need to dive into the code. Often, the issue is that the wrong algorithm was selected for the job.
“Someone has to make a decision that this algorithm is better than that one,” Spinelli says. “That is still a human challenge.”
Next is the tuning of the algorithm. Which features is the algorithm looking at, and how much weight does each one get? In situations where the algorithm comes up with its own features, this can be extremely complicated.
A system that predicts whether someone is a good credit risk may look at a fixed number of data points, and its reasoning process could be extracted and analyzed. But a system that, say, identifies cats in images may come up with a process that is completely unintelligible to humans. This can cause compliance problems for financial services firms, or may put people’s lives at risk in healthcare applications and self-driving cars.
Then there are problems caused by the data itself. “Where are you collecting your data, what groups is it from — this is all stuff that can create bias,” Spinelli says. “It can be bias against ethnic groups or genders, or it could just be a bias that has a negative business outcome.”
Figuring out whether the problem is in the algorithm, the tuning, or the data can be very challenging, he says. “I don’t think we’ve truly solved the problem.”
The world is in a unique situation right now where technology is coming out of research labs and going straight into production, Spinelli adds.
“You see a lot of stuff coming from scientists who don’t have a lot of experience running mission-critical systems,” he says, adding that there are few standards and best practices. “They’re evolving, but it’s a big problem. It’s not mature at this point.”
Take, for example, the fact that most off-the-shelf algorithms don’t have the capability of explaining why a particular decision was made.
LivePerson uses Baidu’s Ernie and Google’s Bert open source natural language processing algorithms. “They have decent audit and traceability capabilities,” Spinelli says. “But, on the large, it’s pretty light.”
When LivePerson builds its own, this kind of functionality is a requirement, he says. “We build our algorithms in a way that there’s traceability, so you can ask the algorithm, ‘Why did you make this answer?’ and it will tell you, ‘Here is what I saw, here is how I read it and how I scored it.'”
Version control for AI is all about the data
Finding and fixing problems in AI systems is difficult enough. But fixes, ongoing improvements, and corrections for model drift — all add up to frequent changes to the system.
Traditional software development processes have version control for keeping track of which lines of code have been changed and who made the changes. But what happens when the changes aren’t in the code, but in the data or the tuning? Or when the systems have built-in feedback loops for continuous learning?
“You can’t have training data change beneath you because you don’t have reproducible results,” says Ken Seier, chief architect for data and AI at Insight, a technology consulting and system integration firm.
Traditional DevOps tools fall short, he says. “You need to add additional steps into the pipeline for the data.”
Development teams building a new instance of an AI model need to be able to snapshot the data that was used and store it in a repository, he says. “Then go into a test environment where they would run it against known scenarios, including auditing scenarios and compliance scenarios, and against testing data sets to make sure they have a certain level of accuracy.”
Most companies are building these tools on their own, he says, adding that the major cloud AI platform vendors are putting a lot of this functionality in place, but are still missing key pieces.
Similarly, automated processes for changing how models are tuned must be developed, and for testing various algorithms to see which performs better in particular situations.
Ideally, if an algorithm goes off track, an automated process could retrain the model so everything works again, he says. “If you can’t get the model into the green again, you need to have a series of fall-back options ahead of time.”
With traditional software development, this could be as simple as reverting to a previous working version of the software. But with AI that’s gone off track because the environment has changed, that may not be possible.
“What happens when the software doesn’t work and they can’t retrain it?” he asked. “Do you pull it out and rely on human operators? Do you have a business process that would allow humans to make those decisions? With self-driving cars, does it mean that they’ll turn off the car?”
Dealing with drift
Training data is typically a snapshot in time. When conditions change, the model becomes less effective. To deal with this drift, companies need to continually test their models against real data to ensure that the system still works.
“If they did a 30-day window to train the model, then every two weeks they should be grabbing a new 30-day window and identifying if a problem has emerged,” Seier says.
This can get complicated very quickly when the AI system changes the behavior it is observing.
Say, for example, an AI system looks at historical data to see when factory equipment is most likely to break. If the factory then uses the predictions to change the repair schedule, then the predictions will no longer be valid — but retraining the model on new data will cause yet another set of problems because the machines will start to break again without the AI’s intervention.
“One of the challenges self-driving cars have is dealing with other self-driving cars,” Seier says. “They’re trained in an environment with human-operated cars, and self-driving cars behave differently.”
Krishna Gade, cofounder and CEO at Fiddler Labs, an explainable AI company, says he would like to see an integrated development environment for AI and machine learning systems that puts data at the center.
“We need an IDE that allows easy import and exploration of data, and cleaning and massaging of tables,” he says. “Jupyter notebooks are somewhat useful, but they have their own problems, including the lack of versioning and review tools.”
As more models get put into production, managing various versions becomes important, he says. “Git can be reused for models. However, it won’t scale for large data sets.”
The data security challenge
As companies move to AI-powered software development practices, they are also facing a new set of security challenges that many are unprepared for.
For example, when systems are created by data scientists instead of traditional software engineers, security can be an afterthought. Third-party and open source AI algorithms can have their own problems, including vulnerabilities and insecure dependencies.
“It’s vital that developers use the latest, most recently patched code,” says Michael Clauser, global head of data and trust at Access Partnership, a global public policy firm serving the tech sector.
Proprietary code from third-party vendors is often proprietary and impossible to analyze.
“It’s a safe bet that the larger, data-heavy Internet companies and other Blue Chips are sweating the cybersecurity small stuff in their own development and deployment of AI,” says Clauser. “That’s likely not the case for early stage startups strapped for resources who are more interested in showing what their AI can do and what problems it can solve than worrying about a hacker one day making their AI the problem itself.”
AI algorithms also have to interface with traditional systems, including databases and user interfaces. Mistakes are common, even likely, when security experts aren’t involved in the security process up front.
In addition, AI systems are often built on the new cloud AI platforms. The security risks here aren’t well known yet. But the big AI security challenge is the data. AI systems require access to operational data, as well as training data and testing data. Companies often forget to lock up those last two sets. In addition, data scientists prefer to build their AI models to use clear-test data, instead of working with encrypted or tokenized data. Once these systems are operationalized, the lack of encryption becomes a major vulnerability.
One company currently dealing with the potential security risks of its AI systems is online file-sharing vendor Box.
“We’re telling customers, give us your most precious content, content that drives your bread and butter,” says Lakshmi Hanspal, the company’s CISO.
Box is now working on using AI to extract metadata from that content to improve search, classification and other capabilities. For example, Box could automatically extract terms and prices from contracts, she says.
To build the new AI systems, Box is taking care not to bypass its traditional levels of security controls.
“For any offering, both AI and non-AI, we have a secure development process,” she says. “It’s aligned with ISO security standards. There are many pipelines within Box, but they all follow a similar process with security by design built in.”
That includes encryption, logging, monitoring, authentication, and access controls, she says.
Most companies, however, don’t build security into their AI development process, says David Linthicum, chief cloud strategy officer at Deloitte Consulting.
In fact, about 75 percent of organizations tackle security after the fact, he says. “And doing it after the fact is like trying to change truck tires when the truck is going down the street.”