Open source helps Facebook achieve massive app scalability

Loved by millions, Facebook has risen from a small-time university social networking service to become the biggest phenomenon on the Internet. But in Facebook's case popularity doesn’t come easily. With some 400 million unique home pages, Facebook is pushing the boundaries of traditional Web application scalability -- and the company is not shy about admitting that much of this success has been achieved by leveraging open source software.

For more on Facebook's use of open source technology be sure to check out the related slideshow Open source at Facebook

Late last year Facebook announced it had surpassed the 300 million user mark. A significant (and growing) number in its own right, but what makes Facebook different is that users do not simply access a page for a search query, but instead actively upload content and interact with other subscribers. If Web analytics company Alexa is accurate, Facebook users spend more than 30 minutes per day using the service -- about three times more than Google. Therefore, the data processing requirements of Facebook make its scalability challenges even more daunting.

Facebook’s core service is built on top of the venerable LAMP stack. Linux, Apache, MySQL and PHP is used by millions of Web sites across the Internet to serve dynamically generated data. Facebook’s rapid rise in popularity in recent years has seen it grow to the point where it now operates the largest single-domain LAMP stack in the world.

Traditional methods won’t cut it for social networks

A popular way to scale a Web application is to continuously add Web and database servers to a cluster in order to distribute the transaction processing demands. Facebook needed to rethink this approach.

David Recordon, senior open programs manager at Facebook, says when scaling a traditional Web site you are able to “break up” the information and can share databases among your users.

“As you have more users you have more databases, and you scale from that perspective, Recordon says. “On Facebook everyone is connecting to other people all over the world. You can’t scale based on where your users are. The challenge is: how do we serve all this information quickly with only having a small number of data centres?”

On average, a Facebook user has about 150 different friends and more than 70 per cent of users now come from outside the US. Thanks to its custom translation system, for instance, Facebook was translated into French in under 24 hours.

“When we render a page on Facebook we are pulling data from many different places. To render a page like that we are talking to many different pieces of our infrastructure without being able to separate them apart by user.”

Recordon, who leads open source and open standards initiatives at Facebook, spoke recently at FOSDEM, the Free and Open Source Software Developers European Meeting, in Belgium to discuss how the social networking giant uses open source software to meet it's enormous user demands.

“And every page is different, not just per person, but at what time they saw it,” he says. “We think of relationships as a graph. You have people which are nodes, and edges to represent the relationships between them. But we’ve also seen Facebook grow to be more than just relationships between people.”

Recordon says the fact that Facebook users can become a fan of a company or person presents “a very different scaling challenge”, one that involves moving from connecting one person to a few hundred other people versus, say, serving Michael Jackson’s fan page which has more than 10 million people connected to it.

Recordon’s co-presenter and Facebook, open source developer advocate Scott MacVicar, also detailed some of the numerous technologies the portal uses to achieve the necessary scale.

"The scaling challenges come from what people do on Facebook," MacVicar says. Some 8 billion minutes are spent by people on the site every day and 3.5 billion pieces of content are shared every week, be text or multimedia elements like photos and videos. With more than 2.5 billion photos alone added every month, Facebook may also be the biggest photo sharing site on Web as well.

“There’s more than just the Web site, there is the API and the platform people use to build applications with Facebook. There are a million users of that,” MacVicar says, adding there are now around 400 million unique home pages.

“If we take a standard page on Facebook -- say a news feed -- to construct that we need to take data from 150 friends and that’s split across multiple servers and it has to be done in milliseconds. It’s not just your direct friends. If other friends have commented on your photos then that has to be pulled in as well. So we’re pulling from potentially thousands of different sources all just to render that one single page.”

A familiar technology architecture

At a high level, the Facebook's architecture consists of a load balancer on top with requests spread amongst a pool of Web servers. These Web servers then use different services to fetch data.

“It will fall into memcache, which is our in-memory fast database access [tool],” MacVicar says.

Recordon says Facebook's architecture looks like any Web server built today and claims that, for the most part, the site isn’t architected any differently than any other site. Where things start to get different is with Facebook’s use of PHP, which MacVicar says is popular among the site's developers because it is simple to learn and uses a syntax similar to C, making it is easy for new developers to learn.

“It’s an interpreted language, so if you want to make changes you can see them live. And because PHP is a templating language it's good for Facebook because we move fast and can’t spend time waiting for other languages.”

However, MacVicar also highlighted how PHP is problematic for Facebook because of its high CPU usage. “Because we do a lot of data assembly and build the page in the app server we use a lot of CPU for that, as well as memory. We’d also like to reuse more PHP logic.”

Page Break

To solve these performance issues, Facebook developed a tool called HipHop that transforms PHP into optimised C++ code. HipHop is designed to give the best of both worlds -- the speed of an interpreted language and the performance of a compiled language like C++.

“What makes HipHop work really well is we can take the majority of our code base and greatly speed it up,” Recordon says.

Facebook has developed software in C++, as well as Python and Erlang, and found writing extensions for PHP can be difficult because the macros are not well documented.

When asked why the company did not develop applications in C++, MacVicar says the reason is due to the fact that Facebook’s code base is about 4 million lines of PHP, so "the first problem would be how to translate all of that to C++ without holding up development on the site".

With HipHop, Facebook's PHP code goes through a 7-step transformation process involving code optimisation and then compiling.

“We want to minimise the differences between PHP and HipHop so you can use either [and] support for Apache is on the roadmap,” MacVicar says.

Interestingly, Facebook’s adoption of HipHop, with its own embedded Web server, is now pushing Apache out of the stack.

“We’ve generally been using Apache with PHP, but HipHop has its own embedded Web server, which is a really simple Web server built on top of libevent. So now we have been moving to using that,” Recordon says.

HipHop only compiles on Linux, but MacVicar says someone from Microsoft has contacted Facebook and “hopefully they will contribute the Windows port”.

“Hopefully someone in the community will contribute the Mac port, or I will do it myself,” he says.

While Facebook attempts to make the most use of the components it has developed in a variety of languages -- C++, Erlang (used for the chat service), Java and Python – the company's philosophy is not to choose a single language when building infrastructure.

“The entire Web server infrastructure is PHP, but we use many different languages, from a backend infrastructure perspective, depending on the service,” Recordon says.

“Philosophically, we think about ‘when do we need a service’. Is it something that needs to be really quick? Is it something that currently has a big overhead in terms of our application layer deployment and maintenance? Do we want another failure point in our network? We need to balance those.”

Standard tools for common tasks

Facebook's extensive use of open source software has also fostered a culture of giving back to the community. HipHop is open source, as are many of the system administration and data integration tools that Facebook uses.

“We use Thrift to communicate between all [the] services we have, something we open sourced,” MacVicar says. “It is essentially an RPC server and can generate code for you in a number of languages.”

According to Recordon, Facebook looks at a variety of open source and commercial tools to manage its information, not just the free stuff.

With about 400 billion page views a month, Facebook logs a staggering 25TB of data every day. “We used Syslog for logging, and the logging server sort of exploded because we were logging so much information from all these page views,” Recordon says. “So we went and created Scribe and we are able to break up this funnel from a logging perspective. The logging information from servers will get routed into Scribe servers."

Once routed to the Scribe servers, Facebook log data is then condensed and stored in the Hadoop and Hive cluster to be used for future data analysis.

“Logging is a common problem people have so we also open sourced Scribe and it’s now being used by Twitter and they are contributing back to the project,” Recordon says.

Facebook’s other scaling challenge revolves around photo sharing which, like almost all aspects of the site, has ballooned to a massive scale. There are an estimated 40 billion photos hosted by Facebook, each stored in four resolutions for a total of 160 billion photos. In all, about 1.2 million photos are served every second.

“The first thing we did was NFS using a commercial solution, because you have to choose the battles you are going to fight," MacVicar says. “But unfortunately it just didn’t scale. It wasn’t that the commercial solution was bad, but the I/O was so high it simply didn’t work.”

To overcome this challenge, Facebook first optimised its file serving capability and then took a deeper look at how files are kept on a file system.

“We developed a system called Haystack, which allows us to serve photos with one physical read on the disk,” Recordon says. “So it doesn’t matter from a random data access perspective, it’s always one physical read to serve a file. It takes about 300MB of RAM to run this for every terabyte of photos we have. We went from 10 I/O operations per photo to one.”

Haystack is not an open source project yet, but Facebook is working on making it open source because “we really think it’s useful for a lot of sites both large and small,” Recordon says.

Data storage and analysis goes big time

Facebook’s infrastructure relies on memcache for faster database access and this acts as a “middle tier” between the Web servers and the databases.

Memcache was originally developed for the LiveJournal blogging service to improve the performance of its database-driven Web sites and is now used by many of the largest sites on the Internet, from Facebook to Wikipedia.

“It’s great, but our engineers need to use it in a smart manner,” Recordon says. “We currently serve about 120 million memcached requests per second, so we’re incredibly reliant on memcache to make Facebook work.”

The system was stress-tested when Facebook launch of its new personalised username capability in June 2009. Making good on its promise to give its millions of users an easier way to share their profiles, Facebook let people chose an alias for their online profiles on a first-come, first-serve basis. Members responded in droves, registering new user names at a rate of more than 550 a second. Within the first seven minutes, 345,000 people had claimed user names; within 15 minutes, 500,000 users had grabbed a name.

“We had about 200 million users at the time, so we asked 200 million users to access Facebook at the same time,” Recordon says. “This is like a denial of service attack that you bring upon yourself, but for us it was really a product launch. And a million usernames were assigned in the first hour.”

1 2 Page 1
Page 1 of 2
6 digital transformation success stories