How the New York Times Covered the Election with Amazon
The New York Times cobbled together a Ruby application to run on Amazon to report on live election results
Tue, December 21, 2010
IDG News Service — Anticipating an immense number of visitors seeking live data, The New York Times turned to Amazon, as well as to a set of open source software, to deliver nearly instantaneous results of the U.S. elections held last November.
With its setup, the Times was able to handle bursts of traffic up to "thousands of requests per second" with a total cost of only "a few hundred dollars," explained Times developer Ben Koski in a blog post Monday afternoon.
The Times maintains an interactive news department to serve website readers with graphs, charts and other enhanced content. Over time, the department has built up a cloud software stack to serve these projects, Koski said. Typically, the department's projects are written using the Ruby on Rails framework, with the data stored on a MySQL database running on Amazon's RDB (Relational Database Service). The Web apps themselves are run on Amazon's EC2 (Elastic Compute Cloud).
Election night generates more traffic than the usual news story, however, so the team had to rethink its approach slightly.
"Publishing live election results requires a carefully tuned system: the setup must be able to withstand some of the most intense traffic levels seen all year ... but at the same time, it needs to get information to our readers quickly," Koski said.
For the coverage, the site had posted 184 separate pages covering election results, each of which needed to be updated every few minutes from an Associated Press news feed.
To ensure uptime, the team decided to forgo its usage of Varnish, an open source Web accelerator application that holds dynamically-created pages in memory so they don't have to be recreated for new users. While Koski noted the software had posed no problems in the past, the team "decided it was too risky to lean entirely on an ephemeral cache for an evening where seconds of downtime matter," Koski explained.
Instead, the team resorted to a tried-and-true method of server Web pages quickly -- serving them as flat files. The team set up a pool of four application servers behind a bank of Amazon servers running Apache (APA) that assembled pages from the latest data and periodically uploaded them.