New Software Detects Bots Scraping Website Data

Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

By Jeremy Kirk

Wed, November 04, 2009 — IDG News Service — Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

The data ends up on other competing job boards, which have stolen the content. It's a problem that plagues any Web site whose intellectual property must be publicly posted for free, or even those with subscription models.

But an Atlanta-based security company that specializes in detecting bots has developed software that can detect those screen-scraping and data-mining bots.

Pramana's main product, HumanPresent, detects automated bots that, for example, enter spam into Web-based forms or register for free e-mail accounts to be used for spam.

Pramana has now developed a module called "data mining and screen scraping prevention" for HumanPresent. It works on many of the same principles as its main product but has been modified for data-mining scenarios, said David Crowder, Pramana's CEO.

HumanPresent can detect bots by noticing differences in the way a human would normally interact with a Web page and contrasting that with how bots behave. It looks at more than 30 metrics, such as keyboard strokes, mouse clicks and the timing of those actions.

HumanPresent looks at single transactions, but the data-mining module has been modified to look at a timed period when either a bot or human is on the site, Crowder said.

Data-mining bots tend to entirely circumvent a browser's user interface. For example, a bot may request a Web page with lots and lots of data, but never scrolls or clicks on a page. If a series of pages are opened and viewed in that manner, it could mean a data-mining bot has arrived.

Pramana assigns a unique ID to the visitor, and after analyzing the visitor's behavior, can make a decision whether to label the visitor a bot or not. There are several different ways a Web site operator can then choose to deal with the situation.

The IP (Internet Protocol) address of the bot's computer can be block permanently. One car auction Web site that is testing Pramana's data mining module decided to move suspected bots into a "sandbox" where it is served completely false data.

"They're indeed data mining -- it's just dead wrong," Crowder said.

Other options include prompting the Web site visitor with a challenge or task, which some bots aren't capable of completing.

Data mining costs companies dearly. Companies that sell premium data will find that their competitors will buy a subscription and then use automated bots to steal the data for their own sites. In one example, a Web site that has gigabytes of data on used car prices found their data had been scraped and was for sale on eBay.

Pramana

Loading...
 
SPONSORED LINKS
 

Making Consumer Two-Factor Authentication Simple and Cost-Effective

Mining the Cloud to Ease the Enterprise Compliance Burden

Solve Five Key IT Security Challenges with Cloud-Based Authentication

White Paper: Managed Security for a Not-So-Secure World

Secure Email and Web-Based Communication from Evolving Attacks

WagerWorks Takes Fraudsters Out of the Game using iovation

White Paper: A Security Blueprint Delivered From within the Network

Return on Information: Google Enterprise Search pays you back

ROI of Application Delivery Controllers

Webcast: Unleashing the Power of Customer Data

Disciplined Autonomy: Resolving the Tension Between Flexibility and Control

Enterprise Capture: Your Onramp to Business Process Automation

Cloud Computing--What is its Potential Value for Your Company?

Seven Design Requirements for Web 2.0 Threat Protection

Adobe® LiveCycle® solutions for business process automation

10 Ways Excel Drives More Value from Your SAP Investment

The Key to Proving and Improving the Value of IT to the Company

Unleash the Power of Java with Oracle JRockit Real Time

Taking the Service Desk to the Next Level

Return on Information: Google Enterprise Search pays you back. Get the facts.

VMware. The source for Business Infrastructure Virtualization.

ShoreTel tells businesses to untangle from competitors' complexity and turn to its brilliantly simple UC solution

See how AT&T can help protect your network.

Streamline IT Costs. Boost Performance with WAN Optimization.

Build your 1st app FREE with Force.com

Authentication as a Service by Forrester Research

Cloud-Based Authentication for Next-Generation Extranets

Mobile Security: The Essential Ingredient for Today's Enterprise

IDC White Paper: CCM for IT Compliance and Risk Management

Keeping Your Members Safe from Online Scams and Predators

Learn about the growing threat of insider data theft.

Upgrading to VMware vSphere with vWire

Maximizing website Return on Information with high-quality search

Gartner Magic Quadrant, Application Delivery Controllers 2009

Learn How Web Site Performance Impacts Shopper Behavior

Build a Foundation for Unified Communications

Removing the Barriers to IT Governance: How On-Demand Software Changes the Game

Should Your Email Live In The Cloud? A Comparative Cost Analysis

How Consumerization of IT Will Make Your Business More Productive

How does a software company save big with Green IT?

Translate business strategy into IT strategy and obtain maximum benefits.

eBook: How Can You Make Your People Productive Anywhere?

Mind the Talent Gap: Global Survey on IT and HR trends and challenges

"Enterprise-Proven" is the Prerequisite for Enterprise SaaS Portal Solutions

AT&T Synaptic Storage as a Service. Expand on demand

Trend Micro ranked #1 against real-world malware. Read more.

Webinar: Jump-start your in-house e-discovery with Ringtail QuickCull from FTI Technology

Top Five CIO Challenges

Read the RSA report: Security for Business Innovation

64-page prescriptive guide to security, compliance, and IT operations.

 
 
RESOURCE CENTER