On (Mis)Trusting Google Desktop

Highly usable software, such as Google Desktop, can seem revolutionary, but the web-meets-desktop search capabilities are seductively porous and raise huge privacy concerns.

Without scrutiny, highly usable software that neglects security can seem heroic and revolutionary. Such may be the case for Google Desktop. Most users see the web-meets-desktop search capabilities and don't consider the security implications of making the boundary between google.com and the desktop so seductively porous.

Particularly troubling is the potential for an attacker to access information, documents, and possibly executables through Google Desktop via flaws (XSS in particular) in Google's website. In February Yair Amit et. al. found a vulnerability that could allow remote attackers access to data and functionality through Google Desktop. Rsnake has also pointed out some existing Google XSS vulnerabilities on his blog at ha.ckers.org.

Also, consider that Google Desktop keeps a fairly sizable index and cache for rapid search that by default is unencrypted. This index contains an amazing amount of historical data: It retains previous versions of files, web-based email communications, browsing history, etc. The problem is that this data persists even after reasonable efforts of the average user to delete it from the file system. Tools that purge files when they are deleted (and overwrite them several times) that are popular within corporations and government agencies for example have no effect on Google's index and cache of those files. This represents a sizable risk because it means that Google Desktop may completely obviate some corporate and governmental procedures for purging data.

The information on searches housed by Google Inc. is also concerning.  The Google Desktop privacy policy states: "Enabling Advanced Features also allows Google Desktop to collect a limited amount of non-personal information from your computer and send it to Google. This includes summary information, such as the number of searches you do and the time it takes for you to see your results, and application reports we'll use to make the program better."

The broader Google Inc. Privacy Policy states: "We may share with third parties certain pieces of aggregated, non-personal information, such as the number of users who searched for a particular term." Overall, Google's privacy policy seems to focus on disassociating you with the information you give but not necessarily the privacy of the information itself. This point was recently addressed in an interview with Google's Deputy Counsel Nicole Wong in the San Jose Mercury News who said "When you launch a search at Google we do record that a search has been asked for and the delivery of the result. That is not personally identifying data. It is identified by the IP address and the cookie only."

The problem is that search data itself might be sensitive even if it is completely disassociated from the person who submitted it. For example, Google Suggest, a feature of Google that is included in the Google toolbar attempts to associate words together to assist in searches by analyzing data about the overall popularity of various searches (ref: here). By searching to see if two pieces of information are associated on the Internet, you may actually disclose the fact that they are. I call this the search disclosure principle. This is a close cousin to the observer effect in that the act of searching information contributes to the information available. The mere fact that two pieces of information are being associated by people can be a potential privacy violation, national security risk, or corporate exposure even if the query is completely disassociated (or "sanitized") from the person who submitted it.

Imagine the CIA itself searching for the name "Valerie Plame" with "CIA" to see if an operative is exposed (on a blog, etc.). That search may end up associating those pieces of information together, provide a first breadcrumb to follow, and contribute to blowing that person's cover. This is of particular concern in high-sensitivity scenarios like government, medical trials, confidential corporate information, etc. where one doesn't need to know who connected the dots to turn the public, competitors, the press or enemies on to a possible linkage.

The privacy issues listed above assume that everything works as intended. A greater concern may be the aggregation point of sensitive data created on Google's servers. Consumers have seen many breaches at data warehouses over the last few years (CardSystems, TJX, etc.) that one wonders how many financially-driven attackers will soon become incentivized to turn their sights on Google. One could only imagine that the combined Google/DoubleClick data pool would contain enough "big brother" data to have made George Orwell salivate. Perhaps most interesting is that much of the data housed by Google (such as search history, etc.) isn't covered under many disclosure laws (such as California Senate Bill 1386). This means that, depending on the breach, Google may be under no obligation to inform the public. Privacy International, a UK consumer protection group, came down particularly hard on them in their privacy assessment putting them 23rd out of 23 companies studied. They went so far as to say, "While a number of companies share some of these negative elements, none comes close to achieving status as an endemic threat to privacy."

All of this means that Google has a big security burden to bear which is becoming increasingly cumbersome as their success grows. Google has a track record of building cool products that people love to use but they also have an ethical responsibility to match their ambition in features with security. If ethics don't win out, change may come at a higher price through regulation and shaken consumer confidence. Here are a few open questions that need answers: How is information provided to Google pushed out to partners, advertisers, and the public? What security mechanisms are in place to protect aggregated data on servers from vulnerabilities (this goes beyond masking the identity of the person whose behavior is tracked and speaks to the behavior data itself)? What is Google's policy for disclosing a breach of any search/behavior data that isn't covered by current (and narrow) breach disclosure legislation? I look forward to hearing your thoughts/opinions on this either as public comments on this blog or privately to hthompson@peoplesecurity.com.

Hugh Thompson is chief security strategist at People Security and author of the upcoming Protecting the Business: Software Security Compliance (John Wiley & Sons, 2007).

This story, "On (Mis)Trusting Google Desktop" was originally published by CSO.


Copyright © 2007 IDG Communications, Inc.

Discover what your peers are reading. Sign up for our FREE email newsletters today!