The Skinny Straw: Cloud Computing's Bottleneck and How to Address It
Virtualization's bottleneck was server memory; new servers address this. With cloud computing, the bottleneck is bandwidth to and from the cloud provider. CIO.com's Bernard Golden discusses four practical ways cloud customers can address this issue.
Thu, August 06, 2009
CIO — Virtualization implementers found that the key bottleneck to virtual machine density is memory capacity; now there's a whole new slew of servers coming out with much larger memory footprints, removing memory as a system bottleneck. Cloud computing negates that bottleneck by removing the issue of machine density from the equation—sorting that out becomes the responsibility of the cloud provider, freeing the cloud user from worrying about it.
For cloud computing, bandwidth to and from the cloud provider is a bottleneck. We recently performed a TCO analysis for a client, evaluating whether it would make sense to migrate its application to a cloud provider. Interestingly, our analysis showed that most of the variability in the total cost was caused by assumptions about the amount of network traffic the application would use. This illustrates a key truth about computing: there's always a bottleneck, and solving one shifts the system bottleneck to another location. Virtualization implementers found that the key bottleneck to virtual machine density is memory capacity; now there's a whole new slew of servers coming out with much larger memory footprints, removing memory as a system bottleneck. Cloud computing negates that bottleneck by removing the issue of machine density from the equation—sorting that out becomes the responsibility of the cloud provider, freeing the cloud user from worrying about it.
[For timely cloud computing news and expert analysis, see CIO.com's Cloud Computing Drilldown section. ]
For cloud computing, bandwidth to and from the cloud provider is a bottleneck. For some applications, the issue is sheer bandwidth capacity—these applications use or generate very large amounts of data, and the application user may find that there's just not sufficient bandwidth available to shove the data through, given the network bandwidth made available by appropriate carriers. A term often used for this is "skinny straw," inspired by the frustration one experiences when trying to suck an extra-thick milkshake through a common beverage straw. The TCO exercise illustrates a different skinny straw—an economic one. For some applications and some users, the bandwidth available may be technically sufficient, but economically unviable.
This problem is only going to get more difficult. The excellent UC Berkeley RAD Lab Report on Cloud Computing noted that price/performance of network capacity lags that of both compute and storage, indicating that this will be an issue well into the future. On the other hand, this is a price/performance issue, which is to say another way it could be addressed is to drop pricing of transit bandwidth through making more available. As I noted in my discussion of the recent Structure 09 Conference, during a panel on the topic of bandwidth availability, the AT&T representative stated that the issue is not network capacity, but business case.
What's interesting about the recent Google Voice/iPhone App Store dustup is how it relates to the future role of the network. Those who believe AT&T was behind the Google Voice rejection describe the motivation as reflecting the carrier's fear of being relegated to a "dumb pipe," reduced to doing nothing more than ferrying other people's bits, rather than providing its own high-margin network applications. If that is truly AT&T's reasoning, it indicates the enormous opportunity the near future holds in being the solution to the skinny straw issue. A cascade, a torrent, a deluge of data is going to want to move around the network, and being the "dumb pipe" that carries it is going to be far more lucrative than trying to compete in figuring out what the next great network-intensive application is going to be. Simply put, cloud computing, in all its *aaS vehicles, is going to be the future of application delivery, with a complementary explosion of network traffic.
As a cloud user, this fact—that network traffic is becoming a far larger part of application deployment -- will affect cloud computing applications and architectures for the foreseeable future. This is going to be a tricky topic because, as noted earlier, as bottlenecks are addressed, they shift. With respect to cloud bandwidth, one can expect that the bottleneck will be gradually and incrementally relieved, meaning that assumptions about network cost and availability will need rethinking every six months or so—the application architecture that made sense six or 12 months ago might not at another point in time.
How to Deal with the Skinny StrawSo, if you're interested in implementing a cloud computing application, what should you do to address the skinny straw issue?
Evaluate and price application data transfer needs: Obviously, the foundation of dealing with the skinny straw is to evaluate how much data you're likely to be transferring. This is particularly important when considering an external cloud provider, because they typically charge a network traffic fee based on volume, unlike internal applications which usually do not have a granular pricing mechanism in place. Furthermore, because application use changes over time (which is one of the reasons the scalability of the cloud is so desirable), incorporate projections of data use into the evaluation. Obviously, this is challenging; after all, one of the reasons cloud scalability is so desirable is because, as application providers, it's nearly impossible to predict potential application growth. A Monte Carlo-like simulation will prove helpful here just to illustrate the potential issues with regard to network traffic, both from a technical and economic perspective.
Another important aspect to evaluate is the variability of data transfer. Some applications, particularly those associated with analytics, have large load early in the life of the application, when ETL is performed; subsequently, there is little data transfer in as incremental updates are loaded. The download portion of an analytic is typically reports or aggregated data structures, which may not be that expensive. Understanding the patterns of data transfer is important, therefore, as the variability can make it difficult to predict costs using a too-general assumption of traffic.
Evaluate application architecture and consider application partitioning: An application may have sections that transfer lots of data and other sections that do not. It may make sense to partition the application so that data transfer-heavy portions reside where data transfer is cheap (i.e., an internal data center or a hosting provider), while other portions reside with a cloud provider. In a sense, this is a continuation and extension of the move to service-oriented applications, which are built by integrating independent components that communicate via well-established protocols. However, careful evaluation is important because one might run into the issue identified in the previous section—unexpected surges in data volume causing increased costs. The thing you want to avoid is to end up with an application where part of it resides in an external cloud and has high data traffic along with low latency requirements—that's a recipe for high costs and poor performance.
Broaden the assessment to an application portfolio: Some applications, due to data transfer needs, just don't belong in an external cloud environment. Instead of trying to figure out some way to make them work, recognize the fact. Application partitioning is a good strategy, but can be challenging to manage. Moreover, many applications are not architected such that partitioning can be implemented; unfortunately, the move to well-structured, service-oriented applications is not universal. A better approach is to examine the portfolio of current and future applications and identify which ones have the right architecture and data transfer needs to work in a cloud environment. If one were really dedicated (and clever), the portfolio could be evaluated to see how common functions could be factored out and implemented as stand-alone services; however, that is the premise of the SOA revolution, which has ended up more of a whimper than a bang, so aiming for this kind of outcome may be overreaching.
Recognize the importance of this issue and don't get caught in the hype: The issue of bandwidth and data location is critical and won't go away. I'm not a big fan of the current catchphrase of "cloudbursting" because I feel it overstates what cloud computing can achieve. Virtualization does imply that systems can be migrated, and once you migrate a system to a separate server inside the data center, why not to a different data center that is a cloud provider? However, migrating a system does not migrate the data it operates upon, assuming the application executes in a shared storage environment, which most virtualized environments do (eventually, anyway, even if they start with DASD).
One still faces the issue of where the application's persistent data resides, and if the application executable has been "burst" to a cloud provider, it implies that it will operate against persistent data continuing to reside in the original data center, which poses issues of bandwidth and economics. Of course, one could replicate data to the cloud location, prestaging (so to speak) data for when the application executable migrates to the cloud, but that is a fairly complicated application topology posing issues of its own.
For very high traffic applications (the kind that, presumably, would be most likely to be "cloudburst"), application architectures typically incorporate performance mechanisms like memcached to support load; these solutions are not designed to operate across data centers, hampering the opportunity to take advantage of application bursting.
How does this kind of thing work out in the real world? At the beginning of this post, I noted a recent TCO analysis we performed. The original analysis indicated that the amount of traffic that the application currently supported would make a cloud implementation economically infeasible. However, the application shared the hosting environment with several other applications, and it turned out that the vast majority of the traffic was generated by the other applications.
Furthermore, the original analysis evaluated the cost of the cloud provider, including its data transfer costs, but assigned no data transfer costs to the hosting provider, because it offered "unlimited" network traffic as part of its total package—except that, when one really probed the "unlimited" capability, it turned out it wasn't really unlimited, and at the traffic levels at which the financial analysis was being performed, the hosting provider was going to be just as expensive as the cloud option.
After a TCO calculation that included the entire range of costs, the cloud option turned out to be less expensive. The interesting thing about the exercise is the fact that so much of the economics turned on the network traffic—certainly unintuitive for most of the participants in the analysis.
Bernard Golden is CEO of consulting firm HyperStratus, which specializes in virtualization, cloud computing and related issues. He is also the author of "Virtualization for Dummies," the best-selling book on virtualization to date.
Do you Tweet? Follow everything from CIO.com on Twitter @CIOonline.