Microsoft Researchers Say Anonymized Data Isn't So Anonymous

Data routinely gathered in Web logs - IP address, cookie ID, operating system, browser type, user-agent strings - can threaten online privacy because they can be used to identify the activity of individual machines, Microsoft researchers say.

By Tim Greene
Thu, February 02, 2012

Network World — Data routinely gathered in Web logs - IP address, cookie ID, operating system, browser type, user-agent strings - can threaten online privacy because they can be used to identify the activity of individual machines, Microsoft researchers say.

At the same time, analysis of such data when anonymized can help detect malicious activity and so improve overall Internet security, they add.

CATCHING ATTACKERS: Microsoft team discovers malicious cookie-forwarding scheme

The researchers found that 62% of the time, HTTP user-agent information alone can accurately tag a host. Combine that same information with the IP address, and the accuracy jumps to 80.6%. If the user-agent information is combined with just the IP prefix the accuracy is still 79.3%, they say.

The highest accuracy came when more than one user ID was linked to a single host, as would be the case in a family that shares a single computer. In such cases, multiple IDs would accurately represent that one host computer. The accuracy rate was 92.8%.

The analysis of this seemingly benign information was based on a month - August 2010 - of anonymized Hotmail and Bing data on hundreds of millions of users. The researchers say they tried to find out whether a single piece of log data can uniquely reveal a particular host.

They found that even anonymized data can leak information. For example, replacing an IP address with its IP prefix still yields enough information that when combined with other commonly logged factors can be revealing. ""[C]oarse grained IP prefixes achieve similar host-tracking accuracy to that of precise IP address information when they are combined with hashed [user-agent] strings," the researchers say.

They looked at data gathered from application-layer events directed at Web servers within the Hotmail and Bing networks.

From Hotmail, they gleaned coarse data about the OS and browser types, source IP address, time of login and anonymized user IDs. From Bing, they gathered anonymized HTTP user-agent strings, source IP addresses of queries, times of queries, anonymized cookie IDs issued by Bing and creation dates of the cookies.

The researchers set out to detail how much identifying information gets revealed by common identifiers. They weren't trying to discover specific individuals' activities, but to understand the patterns of aggregated activities and explore their implications.

The researchers say their use of the data falls within Microsoft's privacy policies and as part of that policy the data can't be made available to outside researchers.

They found that service providers can recognize 88% of devices that receive a cookie, clear the cookie, then return to the site, if they examine other identifying factors they gathered during the initial connection. Even if they use private browsing mode, which is designed to protect user identity, they can still be identified, the researchers say.

"Our analysis suggests that users who do not wish to be tracked should do much more than clear cookies," the researchers say, and note that in some circumstances clearing cookies can help identify a particular host. "Uncommon behaviors such as clearing cookies for each request may instead distinguish a host from others who do not do so."

The researchers did offer some tips for maintaining anonymity:

* Use a browser whose default user agent string is popular, making that string less useful for identifying your machine in particular.

* Even when using anonymous routing like Tor, use tools such as Torbutton to manage identity information.

* Consider using proxies.

Read more about wide area network in Network World's Wide Area Network section.

Originally published on www.networkworld.com. Click here to read the original story.
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Most businesses rely on the stability and power of Linux or Unix in their data centers. But these venerable platforms also create compliance and security challenges. In this Quest white paper, learn to get the most from your Linux or Unix environment, while keeping your data safe and secure. Read it today.
The promise of enterprise mobility means that employees are more productive and address business issues in a timely, untethered manner.
Read this new eBook to learn the top five scenarios and essential best practices for preventing database attacks and insider threats.
The options for securing increasingly valuable databases are very broad and deep, and can be confusing. This research provides an overview of three categories of controls that should be implemented to ensure that enterprise data is protected in the most efficient and effective manner.
Read the analyst report and learn how you can leverage the core capabilities of a DAP solution for better database security.
PCI DSS is mandatory for any business that handles confidential cardholder data. Riverbed® Stingray™ Traffic Manager and Stingray Application Firewall Module help with many parts of the PCI DSS specification, notably the web application firewall (WAF) requirements of section 6.6.
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with a holistic approach to data security and compliance.
Date: Wednesday, June 13, 2012, 1:00 PM EDT / 10:00 AM PDT

In a recent study conducted by Ponemon Institute, fifty-five percent of respondents indicated they were not confident that their organization would be able to detect the loss or theft of sensitive personal information in their company's databases and applications.

Join featured guest Dr. Larry Ponemon from the Ponemon Institute, to discuss these new findings and how to best address the growing number of data breaches and privacy challenges that are facing your organization. This webinar will focus on:

- Understanding the current state of privacy and data protection in the production environment
- Identifying areas of greatest vulnerability
- Keeping data secure without sacrificing productivity
- Enterprise and configurable solutions for multiple applications
Learn how IT teams can protect against spear phishing tactics. Harry Sverdlove, chief technology officer of Bit9 offers a frank discussion about spear phishing - the most common technique used in today's advanced attacks. Learn how spear phishing works and three recommendations for IT to protect against modern threats.
Download this eSeminar to hear from experts Ziff Davis Enterprise, VMware and HP and learn how client-side virtualization can improve your organization's performance, while reducing the IT burden of managing and maintaining an increasingly diverse client universe.
In this exclusive webcast from Viewfinity, you'll hear how to leverage Group Policy Object settings to close this vulnerability by elevating privileges for standard users.
More companies are adopting business service management practices to better align their business and IT needs. Download this video to hear findings from the 2011 BSM Maturity Benchmark Survey to learn how companies are taking a customer-centric approach to IT management.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Sponsored Links

Master the cloud with the power of convergence from HP

Connect with IT leaders redefining mobility at the Enterprise Mobile Hub

Choose New and manage one device instead of 170

Choose New for 8x the firewall and NAT performance

Check out a smart way of mobilizing your business with enterprise-ready Samsung Mobile.

Redefine your data center with HP servers.

Enhance your business with Windstream IT Solutions. Speak to someone local.

BlackBerry® Mobile Fusion. Different mobile devices. One platform.

Click to see how Accenture has delivered high performance to clients

CYBERMARYLAND | Learn Why Maryland is the Epicenter for Cybersecurity

Get Ethernet speeds from 1 Mbps to 10 Gbps - Comcast Business Class

Cognizant. Leading in Business, Application & Technology Services

Collaboration: driving better business outcomes

Gain cutting-edge insights at MIT in 2-5 day executive programs.

Complimentary Gartner Report on BYOD: Media Tablets & Beyond. View Now

Elevate storage agility and efficiency with HP 3PAR storage.

Choose New and slash the number of devices you manage

Customized information views & Twitter events at New Fulcrum Point

Splunk translates machine data into "aha" moments for IT and the business.

ManageEngine Desktop Central - Automate and Audit Your Desktop Management! Learn More...

Cloud Readiness Starts with Intel® Technology

High performance. Delivered. Click to see Accenture's client successes

Visit the Virtually There Learning Page to learn how to use virtualization to your competitive advantage.

Free: Hunter Muller's "The Transformational CIO."

Join us for an upcoming Microsoft 365 live online demo event.

Discover your easiest path to unified communications

Virtualizing Your Infrastructure Just Got Easier

Connect with global CIOs now at Enterprise CIO Forum

Resource Center