Cloud coding pitfalls: Tips for avoiding big, bad bugs

Sometimes the freedom of loosely coupled cloud-based services makes for a serious bugfest.

Credit: flickr/Joshua Tree National Park

According to this ACM article, the seven coding constructs that have been the most frequent source of bugs are function calls, assignments, conditions, pointers, uses of NULL, variable declarations, function declarations, and return statements. There are dozens of other conference presentations, books, and taxonomies that provide statistically valid guidance — or at least opinions — on coding practices to avoid.

But so far, I haven’t found anything like that for coding in the cloud.

And make no mistake about it, the distributed, multi-language environment inherent in the cloud presents some real coding challenges. But before we nerd out entirely, let’s do a bit of bug triage. There are three interesting categories of bugs:

  • Those created and found during initial coding and integration
  • Those found before deployment (i.e., during UAT, early release candidates, and final production testing)
  • Those found only after deployment, usually not fixed by the original developer.

While all bugs are annoying, it’s been pretty widely documented that the first category is much less expensive to resolve … and those intramural bugs don’t lower the credibility of the development team or the system they are working on. The truly dangerous bugs are the last ones that might be two orders of magnitude more expensive to fix and cause failures that are in full view of the users.

Let’s go a bit further. Bugs in the first category are typically logic errors that can be caught by tools and automated testing. Bugs in the last category are caused by human frailties: imprecise communication, incomplete documentation, memory failures, long learning curves, sketchy error handling, sloppiness under pressure and plain old sloth. There are no software tools that can fix that laundry list, so the key is to avoid coding issues that make your project more vulnerable to those human foibles.

So let’s inject some humorous counter-examples:

Tower of Babel

In the most ambitious cloud platforms, language choices proliferate, and for a tricky application your developers may have to work in six languages at once (yes, really). So tip #1: Don’t change languages frivolously, as cross-language debugging is painful. If there is some wonderful math library in Python you really do need to use, encapsulate it as a Web service and call it via REST.

Some languages are more prone to errors (or more likely to tempt developer slop) than others. Languages with strong variable typing and automatic memory management/garbage collection help avoid a wide range of errors. In contrast, endless articles have been written on VB and C++, but even if you encounter those in the cloud try to contain their evil within a Web service that hopefully you don’t need to work on at all.

In the cloud, your team is likely to work in Javascript and its endless libraries, and that makes them vulnerable to that language’s weak typing and excessive case sensitivity. The road to hell is paved with the excluded middle. Get the best debuggers and static analysis tools you can. If you can move on to languages like Ruby on Rails, a raft of problems disappears.

Know that the expensive bugs are those somebody else will fix

The worst of the bugs will be discovered and fixed by someone other than you at some point in the future when they’re tasked with extending the system’s functionality.

Here are some tips to make their job easier:

  • Simpler expressions and simpler methods always win. What you think of as pure elegance and sophistication is likely to be viewed by others as incomprehensible. Any method that’s too complicated to run in your head is too long.
  • Readability of code is more important than compactness. Liberal use of spaces and line feeds can make some bugs much easier to spot. Don’t do tricks like this one-line CSS decryption algorithm.
  • Comments in code are a matter of huge debate, particularly among the Clean Code crowd. But one thing that’s not debatable: misleading comments are worse than nothing at all. Ideally, you should update the comments the same way you update your test code and test vectors — as a requirement before you check in your code. If you can’t bring yourself to maintain the comments in a module you’re changing, take them out.

In the rush to get a prototype done, developers may skip some basics, particularly as they may be working in several languages at once. The next tip is to always do the following basic data checks before performing any operations:

  • Null or undefined variables
  • Empty strings
  • Ridiculously long strings
  • “Illegal” characters in strings
  • Numeric ranges
  • Array out of bounds
  • Date vs. date-time

A corollary of this tip is to watch out for (and root out) data overloading that some genius slid into a “quick fix” that has long since been forgotten.

Some calculations and logical operations seem to encourage bugs more often than average. Some of them can’t be avoided, so the next tip is to budget enough time to develop test code, scenarios and data to test these troublemakers for proper results (not just coverage):

  • Nested IF/THEN/ELSE statements (try to use CASE or SWITCH instead)
  • Compound Booleans (particularly involving XOR, NOT, LessThan or GreaterThan)
  • Regex expressions
  • Day/date calculations, particularly involving weekends and holidays (if the language you are working in doesn’t have primitives for this)
  • Time of day calculations, particularly involving “business hours” (ditto)
  • Calculated indexes for arrays and lists
  • Calculated branches (or, god forbid, GOTO)
  • Conditional roll-ups (particularly when you have delivery schedules or need roll-ups over time windows)
  • Multi-dimensional analytics (put this in the data warehouse, rather than hard coding it)
  • 3D coordinate transformations (hopefully, you’ll be able to use a library for this)
  • Geospatial calculations (ditto)
  • Browser-specific code (particularly for Javascript and CSS … try to use a library to handle this)
  • Mobile-device-specific code (particularly for gestures, other interactors, and OS-specific handlers for maps, address books, etc.)
  • Don’t get me started on IoT code
  • Dynamic code
  • Timeout/retry methods
  • And the granddaddy of them all: error-handling code

Spaghetti: Good for dinner, bad for code

Just because the cloud isolates code inside Web services doesn’t mean that it prevents spaghettification, particularly when modules are extended to do things that were never envisioned at initial construction time. The next tip: use profilers and network traffic sniffers to spot particularly “chatty” Web services that may indicate methods that need refactoring.

The ultimate spaghetti (meta-ghetti?) comes when there’s an incomplete deployment of changes across several modules. Things seem to work, but there will be ephemeral bugs you’ve never seen before and can’t reproduce in your staging system. Pushing from staging to production in large cloud systems is painful at best, and I’ve never seen an automated system work with multiple cloud platforms (e.g., AWS and Salesforce) … even though theoretically it should work. So here’s the final series of tips:

  • All the code, resources, and test artifacts for all Web service nodes is managed by a single source code control system. If you have to use more than one control system, you need coordinating pointers to keep a deterministic state for everything required by each rev of your cloud app.
  • If you are dependent on Windows DLLs or JVMs for one or more of your nodes, keep that system under serious configuration control (yes, turn off auto-updates). Consider using VMs for all nodes in a hybrid cloud.
  • Develop a thorough checklist for all the steps of deployment (including “roll backs”) so that you can avoid this particular level of hell.

What’s old is new again

If all this seems like a recasting of issues from the last century, I can’t disagree. We’ve just managed to make ever more sophisticated ways to waste CPU and developer cycles.

To comment on this article and other CIO content, visit us on Facebook, LinkedIn or Twitter.
Download the CIO Nov/Dec 2016 Digital Magazine
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.