The cloud revolution has been around long enough that every IT leader knows the overarching value proposition: The cloud makes it easier to share resources, it adapts quickly to shifting loads and demands, and it can save time and money by eliminating the need to buy, install and maintain racks of your own hardware.
But there are downsides, and they aren’t mentioned as frequently. Maybe that’s because the idea of offloading your hassles to some Shangri-La of a server farm hidden in the sky is so seductive.
Still, the real worries of moving your workloads to the cloud aren’t the run-of-the-mill maladies. Yes, cloud machines suffer from many of the same issues that confound the boxes in your own data center. If there’s a backdoor in Ubuntu 18.04, it will let in hackers whether it’s installed on a cloud machine or a server in a closet down the hall. All computers are susceptible to power failures, hard disk crashes, alpha rays, malware and worse.
The interesting gotchas, however, are those peculiar to the cloud model. These issues don’t affect the machines in your server farm, or if they do, they don’t threaten catastrophe at the same scale. The trick, of course, is to watch out for and, where possible, solve these 10 cloud-specific issues before they come back to haunt you.
When the demands on your cloud architect ramp up, the cloud can automagically spin up new machines to handle the load. But when it does, behind that curtain, the meter spins faster and faster.
This elastic response to load is supposed to be better than the meltdowns that overwhelmed the old server in the closet. The bits are being delivered where they are supposed to and the job is getting done. But whereas an overburdened local server would simply slow down everyone or generate some 404 errors, in the cloud, your bill can spike suddenly, wiping out the monthly budget in seconds.
This challenge has swamped many. Worst may be developers using the cloud for side hustles: lightning strikes and they’re on the hook for a huge bill. Because of this, cloud providers have added controls, enabling you to set budgets and ask for spending alerts. But that doesn’t fix the underlying architectural issue. All the time your team has put in to redesign your app to make good on the cloud’s promise of seemingly infinite scalability means a potentially infinite bill. There’s no such thing as a free lunch.
Storing backups is a good habit when it comes to safeguarding data. But if your company’s data is squirreled away on RAID arrays you own, the cost and volume of that data is contained. Instead, when your data is sitting in a nested collection of buckets somewhere in the cloud, it becomes much more difficult to know if there are crucial log files or bits buried deep inside.
Most organizations that deploy cloud services get in the habit of keeping everything. It’s just seems easier to keep every scrap of data around just in case, but the fractions of a penny keep adding up and no one wants to make the hard call to erase any of it. Sorting through the data squirrelled away in the cloud to look for the crucial bits can take a massive amount of labor. Worse, the rise of data privacy regulations and security hacks means the freedom to just stuff every bit of customer data into limitless cloud stores “just in case” because it’s easy to do can really come back to haunt you.
When it’s easy to create a new storage bucket, it’s easy to create a hassle to sort through and secure in the future.
If a machine is too small and doesn’t have enough RAM to function, you’ll know right away when the software slows to a crawl or crashes immediately. But if you have too much RAM, no one is going to complain. Because of this, cloud machines tend to ratchet up and become wasteful. Someone will bump up the RAM allocation after a big weekend and no one gets around to tightening the screws again, and now you’re paying for overhead you may never again need.
Some teams dedicate one person to watching the parameters but this just expands the team. Is it cheaper to pay for a few overprovisioned machines or a new team member to wrangle them?
Cloud provider dashboards make it super-easy for developers and business users alike to start up new computers with a few clicks. Plus, it’s only a few pennies an hour, which the company can surely afford, right? And we might as well test the code on a clean cluster of machines, don’t you think?
Keeping cloud costs low is a difficult challenge. Everyone understands the hassle of acquiring hardware. The purchase orders, the budget meetings, the shipping delays. But just as free food or candy disappears in seconds, a few quick clicks can double or triple your monthly cloud bill.
Light loads and sleeping machines leave cloud companies in a bind. They can give away the unused cycles to the other instances sharing the same hardware. After all, why let them go unused if they can make another customer happy?
But when these sleeping machines awaken, they’ll want their share of the hardware back and the others might start to miss the free computer cycles. The code that ran quickly yesterday starts running slowly. Sure, the high speed yesterday was a secret gift, but try telling that to the user whose job is poking along.
One of the most overlooked parts of every cloud agreement is the cost for data movement. We focus on the computers and forget about the flow of bits.
In most cases, we can forget. The average instance doesn’t cross over the threshold for data movement and so many developers don’t even think about the cost of delivering answers to queries. It’s all good until your website goes viral and then the surprise arrives on the bill a month later. If you’ve done a good job architecting the system, caches will answer the huge load and the machines won’t be bogged down. The cloud providers, however, will be counting the bytes flowing out of their system, and will be billing accordingly.
The slings and arrows of outrageous fortune are hard to anticipate. Smart developers might try to test for load by running many local testing bots that ping the machine relentlessly. That can test the quality of their code but it won’t flag the high cost of outgress.
Some companies build their own data center down the hall. Others buy entire buildings. In either instance, everyone knows the physical location of their servers. Cloud machines, however, are rented out without much detail beyond the country and maybe the state in which the machine might be found. Programmers might not care as long as the network connection is fast, but lawyers have been known to argue for years over which political entity is in control. One company I know built a data center in a state without sales tax just to avoid that issue.
It’s easy to lose track of the location of your data and applications in the cloud. Most people don’t care and cloud providers are often deliberately vague for security reasons. But if someone on your team cares about legal issues, they’ll want to make sure your booting up your instances in the right nexus of political control.
If you’ve got the key to your server room or the rack in the colo, the boxes are yours. You’re in charge. The instances in the cloud, however, belong to someone else and they will make up their own mind about what happens to what is hosted on them. Oh sure, they’ll generally defer to you when everything is going along smoothly but the problems will come in stressful situations like a natural disaster or a fight. If some government investigators subpoena your data, they may not even tell your lawyer.
Tilted terms of service
The terms of service are written by the cloud companies and unless you’re writing a big check, you won’t get to negotiate. Like all terms writers, the people who drafted the terms thought of themselves first.
Consider this part of the AWS Terms of Service: “31.3. Your mail domain and End Users’ accounts may be blocked, delayed, or prevented from being delivered by destination email servers and other reasons outside of our control. Your payment obligations continue regardless of whether delivery of your emails is prevented, delayed, or blocked.”
Of course this is no different from problems on an email server down the hall. You’ll still pay even if it’s not working. But there’s something galling about being sent a bill for a service that’s blocked.
All of the cloud services clean up the dusty, moldy projects and toss them away. They have to because people are always experimenting with cloud instances and then forgetting about their experiments or where they live. The problem, though, is that cloud providers perform this clean up on their schedule not yours.
AWS, for instance, promises to delete all Lambda functions that aren’t invoked over the previous three months. If your team builds out the AWS Lambda functions and then puts them on the backburner for whatever reason, you’ve got to remember this deadline and watch for any emails warning you that someone is about to clean out the fridge.