New Software Detects Bots Scraping Website Data

Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

By Jeremy Kirk
Wed, November 04, 2009

IDG News Service — Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

The data ends up on other competing job boards, which have stolen the content. It's a problem that plagues any Web site whose intellectual property must be publicly posted for free, or even those with subscription models.

But an Atlanta-based security company that specializes in detecting bots has developed software that can detect those screen-scraping and data-mining bots.

Pramana's main product, HumanPresent, detects automated bots that, for example, enter spam into Web-based forms or register for free e-mail accounts to be used for spam.

Pramana has now developed a module called "data mining and screen scraping prevention" for HumanPresent. It works on many of the same principles as its main product but has been modified for data-mining scenarios, said David Crowder, Pramana's CEO.

HumanPresent can detect bots by noticing differences in the way a human would normally interact with a Web page and contrasting that with how bots behave. It looks at more than 30 metrics, such as keyboard strokes, mouse clicks and the timing of those actions.

HumanPresent looks at single transactions, but the data-mining module has been modified to look at a timed period when either a bot or human is on the site, Crowder said.

Data-mining bots tend to entirely circumvent a browser's user interface. For example, a bot may request a Web page with lots and lots of data, but never scrolls or clicks on a page. If a series of pages are opened and viewed in that manner, it could mean a data-mining bot has arrived.

Pramana assigns a unique ID to the visitor, and after analyzing the visitor's behavior, can make a decision whether to label the visitor a bot or not. There are several different ways a Web site operator can then choose to deal with the situation.

The IP (Internet Protocol) address of the bot's computer can be block permanently. One car auction Web site that is testing Pramana's data mining module decided to move suspected bots into a "sandbox" where it is served completely false data.

"They're indeed data mining -- it's just dead wrong," Crowder said.

Other options include prompting the Web site visitor with a challenge or task, which some bots aren't capable of completing.

Data mining costs companies dearly. Companies that sell premium data will find that their competitors will buy a subscription and then use automated bots to steal the data for their own sites. In one example, a Web site that has gigabytes of data on used car prices found their data had been scraped and was for sale on eBay.

Continue Reading

This white paper describes the business challenges and opportunities that are driving interest in Identity Governance while discussing considerations your organization should make to help achieve project success.
This paper explores the concept of content-aware IAM, describes the integrated architecture for this new approach, and highlights the benefits that this approach provides.
Without policies, awareness and supported alternatives for sharing files securely, end-users will often overlook security and compliance in favor of getting the job done. Read this whitepaper to determine if your enterprise has a "Dropbox Problem" and ways successful organizations address this problem.
Content provided by Google

Find out about how Google creates a security-based platform for Google Apps, covering topics like information security, physical security, and operational security.
This document is aimed at those looking at data center builds, upgrades, or consolidation. It provides an introduction to some of the new security challenges of such environments and provides recommendations for implementing security in next-generation data centers.
This editorial brief addresses the disconnect between security and operations teams and the need for IT operations teams to address security and risk management.
Learn how Gartner's criteria for next generation IPS helps organizations achieve effective threat prevention despite changes in network communications, new applications, and changes in the threat landscape.
3 minute Flash video - overview of the need for and value of Configuration Control.
Cloud deployments are playing a critical role in propelling innovation for many companies. At the same time security has become the #1 one of the top concerns for IT and business leaders as they migrate into the cloud. In this webinar, learn from Accenture discusses how to recast the cloud as a "fresh chance to rethink your approach to security."
Have you been looking to hear about customer's experiences with the new VMware vCenter Site Recovery Manager product? View this webcast to learn about VMware customer, Navicure, and their experiences testing and evaluating the recovery manager, their progress in implementing it in their environment and their advice other customers considering using vCenter.
Many enterprises have discovered that the use of virtualization to support desktop workloads creates a range of significant benefits. These benefits include price efficiencies, improved IT management and greater agility and choice for end users.

This VMware sponsored webcast with IDC will provide both quantitative measurement of the business value -- defined as the expected ROI -- and qualitative analysis associated with the use of VMware View™. IDC will also provide an analysis of the View Composer and ThinApp™ features of VMware View, including the business value of these solutions and an overview of how they work.

Attend this webcast to learn about:
- Challenges and barriers that might impede the adoption of desktop virtualization
- Navigating roadblocks to facilitate a strategic implementation
- Optimizing qualitative and quantitative benefits to IT and your business
VMware recently announced VMware vFabric™ Data Director, a new database deployment and operations platform that enables enterprise IT organizations to offer database as a private cloud service. Built on top of VMware vSphere 5, vFabric Data Director enables IT organizations to ontrol database sprawl through automation and consistent policy enforcement and accelerate application development cycles with self-service database management. Attend this webcast to learn how vFabric Data Director can help you build database-as-a-service in your datacenter.
Newsletter Sign-Up »

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all Newsletters | Privacy Policy
Resource Center