by Esther Schindler

Google App Engine: Getting Data Out Ain’t Simple. Yet.

Feature
Sep 03, 20085 mins
Data CenterData ManagementDeveloper

Data checks in... but it won't check out! No, not really; data management is just more of a pain than you might expect. To use Google App Engine today, you need to use a Python API to export from its proprietary data store. But soon, Google says, the situation will get a lot easier.

Developers who adopt the Google App Engine for their cloud computing platform today may fear data lock-in, since the only way to import or export data is using a Python-based API. Google is working on a tool to improve data exchange to improve data portability.

The Google App Engine is intended to help developers build and scale applications to run on Google’s infrastructure, says Peter Koomen, Google’s product manager for App Engine. “It’s still difficult to build Web apps,” he notes.

Here’s why in a nutshell: a Web developer who has a bright idea has a steep ramp to climb before she can really get underway. She has to set up an infrastructure stack—renting or buying servers, setting up a Web server, configuring a database engine—burning up both time and money. That’s before a single line of code is written. Then, when her website gets popular (see, it was a bright idea), it can’t handle the load. The site needs to scale, and to do it quickly. So the developer has to burn even more time and money (under pressure) to rent or provision more infrastructure, while figuring out how to split access across multiple Web servers. It isn’t an easy job.

It’s also a common issue. “You see this a lot with the newer social apps coming out,” says Koomen. Certainly we can point to several such examples, including Twitter’s scalability problems, but the need is similar for any cloud-based enterprise application. Business developers, too, need to worry about getting the right infrastructure in place and making it secure and scalable.

Google App Engine, says Koomen, fixes these problems. The application developer can focus on the application, because the App Engine takes care of the infrastructure. To quote Web technologist Niall Kennedy’s blog post last April explaining the then just-announced runtime environment: “Google App Engine lets any Python developer execute CGI-driven Web applications, store its results and serve static content from a fault-tolerant geo-distributed computing grid built exclusively for modern Web applications.”

Sounds cool. So what’s the problem?

Data Lock-In—or Nervous Nellie?

The Google App Engine uses a data store that is… different. It’s not precisely SQL, says Koomen, because the data store is built to run across multiple servers. While most developers who are familiar with SQL databases (from Microsoft SQL Server to MySQL) won’t have a problem using the data store, some things aren’t technically possible. “These restrictions aren’t as terrible as you’d think,” adds Koomen.

Using Google App Engine today also requires Python, which might present a problem to developers who are more familiar with other languages—whether dynamic languages like PHP and Perl, or traditional languages like Java or C#. (Personally, I don’t think Python is a significant turn-off to most Web developers, but programming language preferences are passionate.)

For more on Python, see You Used Python to Write What?! by Martin Aspelli, and Python Upgrades Readied for 2008.

The bigger question—a real one from a developer friend, which is what inspired me to call Google—is whether that data store creates a lock-in, preventing data-based applications from being portable. Start-ups might worry whether their companies are less attractive to investors if they’ve tied themselves to a single vendor’s data store. Enterprise computing professionals would worry that, on top of their concerns about any corporate data living in the cloud, they’d have information they could not easily retrieve or would have trouble migrating to another database. Or, probably more important over the long term, they might worry whether database interactions involving Google App Engine would require fancy custom programming to interoperate with in-house applications, Web services components, a service-oriented architecture or other situations in which the leg bone data must connect to the thigh bone data.

In its current “preview” state, the Google App Engine requires that data be stored and retrieved using a Python API called GQL, which Koomen says is as similar as possible to SQL. You can get data out of your data store only programmatically, not by copying a SQL file from one server to another.

However, Koomen says Google sees this limitation, and the company has actively been soliciting input on how best to address it. In fact, that’s one of the reasons Google makes preview versions available, says Koomen. “We wanted to get it out early and see what developers thought.”

As a result, Koomen says, Google will be releasing a tool that makes it easy to get data out of the data store without writing code. They aren’t ready to go into specifics, other than it’ll be available “within the next two quarters,” but Koomen said the intent is to make it easy to get data out of App Engine. The new tool promises to address the data portability issue, to provide a “home brew backup” and to be “completely open,” according to Koomen.

This isn’t the first time user input has affected the development of Google App Engine. For example, for security reasons they had to remove the Python image library from Google App Engine. (Google App Engine supports 100% of the Python language, Koomen says, and 90% of the Python libraries.) However, developers made it clear that they needed to manipulate images in the data store, such as to scale or rotate images or to create thumbnails. “So that ability is in there, now,” says Koomen.

In the short term, developers who use App Engine don’t have their data locked in. Getting data out of the App Engine data store may be awkward, or at least you should build “Darnit, I have to write that from scratch” time into the project schedule, but these limitations are temporary. That’s worth knowing, certainly, because every developer adopting a new-to-her technology wants to know where the bodies are buried and where her assumptions are incorrect.

Within six months, however, the data lock-in concerns—and the need to write a hack—in Google App Engine should go away. “We still have some ways to go,” says Koomen.