Ultimate Cloud Speed Tests: Amazon vs. Google vs. Windows Azure

A diverse set of real-world Java benchmarks shows that Google is fastest, Azure is slowest, and Amazon is priciest.

Current Job Listings

If the cartoonists are right, heaven is located in a cloud where everyone wears white robes, every machine is lightning quick, everything you do works perfectly, and every action is accompanied by angels playing lyres. The current sales pitch for the enterprise cloud isn't much different, except for the robes and the music. The cloud providers have an infinite number of machines, and they're just waiting to run your code perfectly.

The sales pitch is seductive because the cloud offers many advantages. There are no utility bills to pay, no server room staff who want the night off, and no crazy tax issues for amortizing the cost of the machines over N years. You give them your credit card, and you get root on a machine, often within minutes.

[ From Amazon to Windows Azure, see how the elite 8 public clouds compare in InfoWorld Test Center's review. | Benchmarking Amazon: The wacky world of cloud performance | Stay on top of the cloud with InfoWorld's "Cloud Computing Deep Dive" special report and Cloud Computing Report newsletter. ]

To test out the options available to anyone looking for a server, I rented some machines on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure and took them out for a spin. The good news is that many of the promises have been fulfilled. If you click the right buttons and fill out the right Web forms, you can have root on a machine in a few minutes, sometimes even faster. All of them make it dead simple to get the basic goods: a Linux distro running what you need.

At first glance, the options seem close to identical. You can choose from many of the same distributions, and from a wide range of machine configuration options. But if you start poking around, you'll find differences -- including differences in performance and cost. The machines may seem like commodities, but they're not. This became more and more evident once the machines started churning through my benchmarks.

Fast cloud, slow cloud

I tested small, medium, and large machine instances on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure using the open source DaCapo benchmarks, a collection of 14 common Java programs bundled into one easy-to-start JAR. It's a diverse set of real-world applications that will exercise a machine in a variety different ways. Some of the tests will stress CPU, others will stress RAM, and still others will stress both. Some of the tests will take advantage of multiple threads. No machine configuration will be ideal for all of them.

Some of the benchmarks in the collection will be very familiar to server users. The Tomcat test, for instance, starts up the popular Web server and asks it to assemble some Web pages. The Luindex and Lusearch tests will put Lucene, the common indexing and search tool, through its paces. Another test, Avrora, will simulate some microcontrollers. Although this task may be useful only for chip designers, it still tests the raw CPU capacity of the machine.

I ran the 14 DaCapo tests on three different Linux machine configurations on each cloud, using the default JVM. The instances aren't perfect "apples to apples" matches, but they are roughly comparable in terms of size and price. The configurations and cost per hour are broken out in the table below.

I gathered two sets of numbers for each machine. The first set shows the amount of timethe instance took to run the benchmark from a dead stop. It fired up the JVM, loaded the code, and started to work. This isn't a bad simulation because many servers start up Java code from command lines in scripts.

To add another dimension, the second set reports the times using the "converge" option. This runs the benchmark repeatedly until consistent results appear. This sometimes happens after just a few runs, but in a few cases, the results failed to converge after 20 iterations. This option often resulted in dramatically faster times, but sometimes it only produced marginally faster times.

The results (see charts and tables below) will look like a mind-numbing sea of numbers to anyone, but a few patterns stood out:

  • Google was the fastest overall. The three Google instances completed the benchmarks in a total of 575 seconds, compared with 719 seconds for Amazon and 834 seconds for Windows Azure. A Google machine had the fastest time in 13 of the 14 tests. A Windows Azure machine had the fastest time in only one of the benchmarks. Amazon was never the fastest.
  • Google was also the cheapest overall, though Windows Azure was close behind. Executing the DaCapo suite on the trio of machines cost 3.78 cents on Google, 3.8 cents on Windows Azure, and 5 cents on Amazon. A Google machine was the cheapest option in eight of the 14 tests. A Windows Azure instance was cheapest in five tests. An Amazon machine was the cheapest in only one of the tests.
  • The best option for misers was Windows Azure's Small VM (one CPU, 6 cents per hour), which completed the benchmarks at a cost of 0.67 cents. However, this was also one of the slowest options, taking 404 seconds to complete the suite. The next cheapest option, Google's n1-highcpu-2 instance (two CPUs, 13.1 cents per hour), completed the benchmarks in half the time (193 seconds) at a cost of 0.70 cents.
  • If you cared more about speed than money, Google's n1-standard-8 machine (eight CPUs, 82.9 cents per hour) was the best option. It turned in the fastest time in 11 of the 14 benchmarks, completing the entire DaCapo suite in 101 seconds at a cost of 2.32 cents. The closest rival, Amazon's m3.2xlarge instance (eight CPUs, $0.90 per hour), completed the suite in 118 seconds at a cost of 2.96 cents.
  • Amazon was rarely a bargain. Amazon's m1.medium (one CPU, 10.4 cents per hour) was both the slowest and the most expensive of the one CPU instances. Amazon's m3.2xlarge (eight CPUs, 90 cents per hour) was the second fastest instance overall, but also the most expensive. However, Amazon's c3.large (two CPUs, 15 cents per hour) was truly competitive -- nearly as fast overall as Google's two-CPU instance, and faster and cheaper than Windows Azure's two CPU machine.

These general observations, which I drew from the "standing start" tests, are also borne out by the results of the "converged" runs. But a close look at the individual numbers will leave you wondering about consistency.

Some of this may be due to the randomness hidden in the cloud. While the companies make it seem like you're renting a real machine that sits in a box in some secret, undisclosed bunker, the reality is that you're probably getting assigned a thin slice of a box. You're sharing the machine, and that means the other users may or may not affect you. Or maybe it's the hypervisor that's behaving differently. It's hard to know. Your speed can change from minute to minute and from machine to machine, something that usually doesn't happen with the server boxes rolling off the assembly line.

So while there seem to be clear performance differences among the cloud machines, your results could vary. These patterns also emerged:

  • Bigger, more expensive machines can be slower. You can pay more and get worse performance. The three Windows Azure machines started with one, two, and eight CPUs and cost 6, 12, and 48 cents per hour, but the more expensive they were, the slower they ran the Avrora test. The same pattern appeared with Google's one CPU and two CPU machines.
  • Sometimes bigger pays off. The same Windows Azure machines that ran the Avrora jobs slower sped through the Eclipse benchmark. On the first runs, the eight-CPU machine was more than twice as fast as the one-CPU machine.
  • Comparisons can be troublesome. The results table has some holes produced when a particular test failed, some of which are easy to explain. The Windows Azure machines didn't have the right codec for the Batik tests. It didn't come installed with the default version of Java. I probably could have fixed it with a bit of work, but the machines from Amazon and Google didn't need it. (Note: Because Azure balked at the Batik test, the comparative times and costs cited above omit the Batik results for Amazon and Google.)
  • Other failures seemed odd. The Tradesoap routine would generate an exception occasionally. This was probably caused by some network failure deep in the OS layer. Or maybe it was something else. The same test would run successfully in different circumstances.
  • Adding more CPUs often isn't worth the cost. While Windows Azure's eight-CPU machine was often dramatically faster than its one-CPU machine, it was rarely ever eight times faster -- disappointing given that it costs eight times as much. This was even true on the tests that are able to recognize the multiple CPUs and set up multiple threads. In most of the tests the eight CPU machine was just two to four times faster. The one test that stood out was the Sunflow raytracing test, which was able to use all of the compute power given to it.
  • The CPU numbers don't always tell the story. While the companies usually double the price when you get a machine with two CPUs and multiply by eight when you get eight CPUs, you can often save money if you don't increase the RAM too. But if you do, don't expect performance to still double. The Google two-CPU machine in these tests was a so-called "highcpu" machine with less RAM than the standard machine. It was often slower than the one-CPU machine. When it was faster, it was often only about 30 percent faster.
  • Thread count can also be misleading. While the performance of the Windows Azure machines on the Sunflow benchmark track the number of threads, the same can't be said for the Amazon and Google machines. Amazon's two-CPU instance often went more than twice as fast as the one-CPU machine. On one test, it was almost three times faster. Google's two-CPU machine, on the other hand, went only 20 to 25 percent faster on Sunflow.
  • The pricing table can be a good indicator of performance. Google's n1-highcpu-2 machine is about 30 percent more expensive than the n1-standard-1 machine even though it offers twice as much theoretical CPU power. Google probably used performance benchmarks to come up with the prices.
  • Burst effects can distort behavior. Some of the cloud machines will speed up for short "bursts." This is sort of a free gift of the extra cycles lying around. If the cloud providers can offer you a temporary speed up, they often do. But beware that the gift will appear and disappear in odd ways. Thus, some of these results may be faster because the machine was bursting.
  • The bursting behavior varies. On the Amazon and Google machines, the Eclipse benchmark would speed up by a factor of more than three when using the "converge" option of the benchmark. Windows Azure's eight-CPU machine, on the other hand, wouldn't even double.

If all of these factors leave you confused, you're not alone. I tested only a small fraction of the configurations available from each cloud and found that performance was only partially related to the amount of compute power I was renting. The big differences in performance on the different benchmarks means that the different platforms could run your code at radically different speeds. In the past, my tests have shown that cloud performance can vary at different times or days of the week.

This test matrix may be large, but it doesn't even come close to exploring the different variations that the different platforms can offer. All of the companies are offering multiple combinations of CPUs and RAM and storage. These can have subtle and not-so-subtle effects on performance. At best, these tests can only expose some of the ways that performance varies.

This means that if you're interested in getting the best performance for the lowest price, your only solution is to create your own benchmarks and test out the platforms. You'll need to decide which options are delivering the computation you need at the best price.

Calculating cloud costs

Working with the matrix of prices for the cloud machines is surprisingly complex given that one of the selling points of the clouds is the ease of purchase. You're not buying machines, real estate, air conditioners, and whatnot. You're just renting a machine by the hour. But even when you look at the price lists, you can't simply choose the cheapest machine and feel secure in your decision.

The tricky issue for the bean counters is that the performance observed in the benchmarks rarely increased with the price. If you're intent upon getting the most computation cycles for your dollar, you'll need to do the math yourself.

1 2 Page 1
Page 1 of 2
How do you compare to your peers? Find out in our 2019 State of the CIO report