by Josh Fruhlinger

6 tips to avoid automation disaster

Feb 24, 202012 mins
BPM SystemsDevopsIT Leadership

If your automated DevOps pipeline or robotic process automation is flawed, it can spawn endless headaches, as these hard-learned real-world lessons attest.

software automation gears robotic code by mazimusnd getty and bill oxford via unsplash 2400x1600
Credit: mazimusnd / Getty Images / Bill Oxford; Modified by IDG Comm

In 2015, senior software engineer Benjamin Willenbring was excited when his employer, Autodesk, introduced automated software testing. That excitement didn’t last long. The small automation team didn’t communicate much with his division. And when the tests reached production, they weren’t what anyone hoped for.

“My teammates were talking about tests failing non-deterministically, and not really having a lot of confidence in the test,” Willenbring says. He found that “to get to actually run the test was very, very difficult. It wasn’t documented. You had to talk to someone. And there were an enormous amount of files and I didn’t really understand why.”

Automation was supposed to make Willenbring’s work easier. Instead, the problems it created came to dominate much of his energy for the next several years.

Willenbring’s experience isn’t uncommon. And with automation rapidly spreading through IT, cautionary tales provide valuable lessons.

From the automated workflows of DevOps to robotic process automation (RPA), automated processes aim to reduce scut work and free skilled employees for higher-level tasks. But flawed premises or botched rollouts can turn the dream of automation into a nightmare. We spoke to several IT pros about automation horror stories they’ve heard about or endured, and distilled out six commandments to help your automation initiatives avoid such fates.

1. Automation is part of everyone’s job

Willenbring’s automated test hell had one key issue: the only people who understood the automated tests were those who built them, and they were based in a different city from his division.

“One of the difficulties of the test framework was that it didn’t really provide good feedback when there was a failure,” says Willenbring. “If something failed, the very first thing you did was get on Slack and contact the testing lead and ask, ‘Why did this fail?’ And then he would rerun the test manually — some special version of the framework so that he could see the results — and then would communicate to you what happened.”

The testing framework violated two cardinal rules that Robert Haas, global DevOps product manager at DXC Technology, says every automation regime should abide by. The first is that automation code must be documented. “Whether you use a modern approach like documentation-as-code or you annotate Visio diagrams, resolving problems will be easier if there is some documentation that describes what was done originally,” Haas says.

Without documentation, the automated tests were inscrutable to Willenbring’s team. As a result, they couldn’t understand the results and lost confidence in the tests and the team that created them. “Sometimes developers just said, ‘I don’t care. There’s no way that this is a real failure,'” says Willenbring. “Sometimes there were real failures that the test framework caught. But the essential problem was there was the lack of trust in the tests and the results.”

Another piece of advice from Haas: “Prioritize activities that need to be automated based on the business value they deliver.” But because Autodesk’s test team was so separated from the development team, Willenbring found that many of the areas of his codebase chosen for testing defied common sense. “You need to allow subject matter experts, people who understand what’s important, to select what get tested. If you’re just going through a grab bag of bug tickets, there’s no guarantee that the sheer quantity of tests is any accurate reflection of anything meaningful.”

It took a new director of engineering to pull Willenbring and his team out of this vicious cycle.  The new director mandated that “quality is everyone’s job,” including testing automation. Centralized testing was scuttled, and individual divisions were made responsible for writing tests for the code they worked on. Willenbring and his team could now tailor tests to their needs and integrate that process into their everyday lives. In the end, he says, “You have to establish a zero tolerance policy for a ‘that’s not my job’ attitude.'”

(After our conversation, Willenbring was inspired to write a detailed description of his automation journey, which includes more in-depth material that you might find of interest, including details on the Selenium and Cypress test frameworks he dealt with.)

2. Be prepared for complexity

Security and compliance automation vendor Tripwire recently noted something strange with one of its large financial customers.

“You enable our solutions to be deployed in an automated fashion,” says Irfahn Khimji, country manager for Canada at Tripwire, “and for the longest time we were wondering what was taking so long. Why are we not seeing our license usage go up significantly? Because they’re supposed to be spinning up really fast.”

As it turned out, the ideal of a mostly automated CI/CD (continuous integration continuous deployment) pipeline was running into the reality of diverse business units at the financial organization, each of which relied on its own customized set of software components.

“They’ve got 30-plus applications they were trying to onboard and they ran into some challenges with the commoditization of each of the variations of those applications — the modules that they need in various libraries and things like that,” Khimji says. “To adjust the pipeline to handle all those different variations really slowed down that automation process, because it wasn’t just a click and deploy — it was, okay, click, click, and make sure that each of those various technologies are all working with each other and able to deploy.”

There’s no magic bullet to solve this problem, but you do need to be aware that as the number of components in your automated process grows, the amount of plumbing you need to connect those components grows exponentially more complex. That complexity will add time and resources as you transition to automated processes.

Another factor adding to this complexity is not just how many components you’re connecting but where they come from. Most pipelines or RPA-driven environments include a heterogeneous mix of in-house and third-party components — the latter of which can be a real problem if something goes wrong.

“Ensure that all components of your CI/CD pipelines or automated processes have a maintenance contract with the software provider,” says DXC Technology’s Haas. “If there are open source components, perform a risk assessment to determine whether you should consider using a managed version of the product rather than relying on web-based support from the open source community. “

3. Beware the ‘black box’

Financial institutions have been among the most eager to deploy RPA and chatbots, and Muddu Sudhakar, CEO and co-founder of Aisera, warns of a scenario he’s seen a lot in these environments: Processes are conceived as a single, monolithic unit of functionality that become a “black box” whose internal operations are difficult to tease apart for troubleshooting.

“In a monolithic structure, the customer will check the status of his account, and if he wants to withdraw the money and move it, it will all happen in one step,” he says. “If something goes wrong, and there’s not audited monitoring of that step, the only way to get the money back in a catastrophic failure is to call customer service — maybe you’ll need to go to the bank in person to get it.”

To Sudhakar, this sort of design is often a marker of an organization’s early efforts with automation. A project like this can produce good results as long as everything goes according to plan. But when it doesn’t, organizations generally have to go back to the drawing board to split up those black boxes. Better would be to avoid them to begin with.

“Break down each process into building blocks,” he says, “where each building block is auditable and monitorable.”

4. Build in checks and balances

Ari Meisel is the founder of Less Doing and a productivity coach with a special interest in automating drudgework. To put his money where his mouth is, he often builds automations for his own life. But he learned a valuable lesson when he tried to get out of some parking tickets.

Meisel owned a pickup truck — technically a commercial vehicle — and in New York City, where he lives, the owner of such a vehicle can easily dispute a ticket: “You can send in a letter and say, ‘I was making a delivery.'”

The brainstorm he had was, he admits, “not totally legal”: He created an automation that would randomly generate invoices and send them along with a letter to the Department of Finance disputing his tickets. And it worked, until “it got stuck and sent the exact same letter and invoice a hundred something times, so they figured it out. I had to get a lawyer and they were threatening to send me to jail,” he says. “It ended up costing me $36,000 because I had sort of set it and forget it.”

What his automated process needed, he realized, was a poka-yoke. This term, which means “error-proofing,” is borrowed from the Japanese; within the Toyota production system, it designated a single step in a process that had been broken into two, with the second dependent on the first. This increase in steps paradoxically improves efficiency because errors aren’t punted down the production line where effects can metastasize. Meisel says that in his ticket-dodging scheme, “I could have automatically run something that would have compared an invoice to previous ones to see if it was the same. That’s a very easy thing to do.”

This extends the advice from Aisera’s Sudhakar: Individual automated steps should not just be auditable, but constantly audited by other automated steps. This gets into the realm of AIOps, in which automated platforms take over the burden of IT management from human engineers. “I call it the NASA approach,” says Sudhakar. “NASA has to assume that a failure is going to happen. An AIOps solutions with checks and balances is very important.”

But unless you’ve experienced an automation failure first hand, it’s hard to see the value, says Meisel: “Ninety percent of the time, people have to have something go wrong. They say, ‘I’m not going to have my person spend three days creating this automation that I’ll never actually benefit from.’ And then find out that they need it.”

5. Don’t neglect security

Automated CI/CD pipelines have a dirty secret: Many were first rolled out as shadow IT to work around security mandates. “Developers want to develop,” says Tripwire’s Khimji. “They want to keep things moving and move on to the next iteration of [their code]. So, when a rigorous IT and security process gets in the way, they think, ‘I can spin up these images in the cloud and I can circumvent that.'”

This doesn’t mean the pipelines are inherently insecure, but you should investigate which security practices have been jettisoned for the sake of automated efficiency. Moreover, keep in mind that any automated process represents one more vector for an attack. Processes that operate autonomously may have elevated permissions, making them tempting targets. Aisera’s Sudhakar says he knows of at least one instance of what he calls a “black swan”: a hostile actor working inside a company using its DevOps pipeline to inject malware into the codebase. “That was propagated all the way through to the production environment, and the system was non-operational,” he says.

6. Don’t automate on the cheap

With hype comes misunderstandings. Ajeet Dhaliwal, lead developer and founder at Tesults, has found that many organizations have a vision of “automated testing” that varies wildly from best practices. This is particularly true at smaller organizations where executives don’t have technical backgrounds. Since they understand manual testing, by their logic, automated tests should just be a version of their manual tests that can take place without human intervention.

As a result, says Dhaliwal, “They encourage traditional QA testers performing manual testing who had no background in software development to automate the tests.” Sometimes this means using tools that simply record manual UI tests for later repetition. “The robustness and flexibility of these approaches does not match what a developer can achieve with respect to automation,” he says.

“Automated test developers also need to be software engineers,” Dhaliwal adds. “It’s fine to have some junior developers involved, but these companies needed at least some experienced devs leading the work and they did not have that.”

And, as Autodesk’s Willenbring says, software engineers need to understand how to build automated processes as well. “It’s just going to be assumed that that’s one of the skills your developers will have,” he says. When you find yourself in a position to use these skills, hopefully these tips will help your automation projects find success.