Is scraping data part of your game plan?

It was my colleague’s birthday last Monday. Her closest friends knew. The rest of us in the office knew, once she bought us all free coffees. And Google.

She wasn’t surprisedthat Google knew, but found it unnerving that Google chose to let her know that it knew by displaying a personalised version of its logo with birthday-like icons and a Happy Birthday Sandra label. Did she provide that information to Google at some point? Probably. Did they scrape it from another online cache? No idea. How would she know?

As an advisor to technology companies we occasionally get asked about the legal risks associated with harvesting data via scraping and similar means. If this is a major part of your game plan, here are the starters you should be thinking about.

Consider the sites you are extracting information from – and look at their website terms of use. If the terms of use specifically prohibit access to that site via any means other than a designated browser or use of the information for your commercial purposes, then this should be a red flag. In a legal world that is still getting to grips with whether scraping should or shouldn’t be allowed, this is an easy mechanism for the website owner to demonstrate that you are in the wrong.

Consider the information you are extracting – are you taking information that is personally sensitive (such as personal contact details) or commercially sensitive (such as brand names)? The legal rules around taking personal information, and using another’s brand for one’s own commercial purposes, are both well-developed areas of law and highly protectionist. Are you taking images or compilations of information that are likely to be seen as proprietary either because it is highly original or would have been a labour intensive exercise to collate (e.g. images, or product catalogues)?

Although the law is always a step behind technological developments, the sentiment of the legal cases to date in a number of jurisdictions is that the website owner should be able to prevent scrapers from harvesting information without authorisation. Averill Dickson, Simmonds Stewart

Typically, harvested data includes product descriptions and pictures reproduced from other sites. As soon as you are reproducing another person’s text or images you raise legal issues of potential copyright infringement. These risks are lessened if (a) the images and text are not reproduced in whole, (b) the text is not reproduced verbatim but, as your school teacher would say, restated in your own words (taking care not to mislead or misstate any aspect of the goods or services, however), (c) the images are unoriginal, do not reproduce trade marks, are sourced from a different place to the product description, or otherwise are less likely to be the subject of copyright held by the same owner as the other extracted information.

What is often a surprise to would-be scrapers is that they must also consider the collective effect of a swarm of scrapers (a muster of miners? a horde of harvesters?).

Legal cases have focused on the potential for scraping activity by multiple persons to diminish the owner’s available bandwidth or server capacity. Interfering with a computer system owner’s data usage right (distinct from the use of the data itself) has been recognised as theft in New Zealand (Davies vs Police; and depriving an owner of bandwidth and server capacity has been held to constitute the old-fashioned tort of trespass to chattels in the US (Ebay vs Bidder’s Edge).

New Zealand’s Crimes Act sets out a number of computer-related crimes including dishonestly accessing computer systems without authorisation (s 252), and accessing any computer system dishonestly, to obtain (or even merely intending to obtain) any property, privilege, service, pecuniary advantage, benefit or valuable consideration (s 249).

The first of these (s 252) expressly does not include accessing a computer system as a permitted user and using it for a non-permitted purpose. Arguably accessing a website which is intended for public use, and scraping for data, even though not permitted, would merely be using that computer system for a non-permitted purpose and would not fall foul of this section.

The scope of the second of these (s 249) is relatively untested, however, all indications are that the threshold is not high. Accessing a computer system can be as simple as sending an email (as occurred in Burt v Police) and would certainly include harvesting data by automated scraping. Dishonesty is no more than an absence of any belief that there was any express or implied consent from the relevant person as to the act carried out (s 217).

If the terms of the website expressly or impliedly prohibit scraping, dishonesty seems hard to argue against. Conversely, if the website terms are silent, dishonesty will be harder to establish. A pecuniary advantage, benefit, etc. is simply anything that enhances the accused’s financial position, according to New Zealand’s Supreme Court (Hayes v R). This would almost certainly cover obtaining data (at little or no cost) so as to increase the scraper’s potential for commercial sales.

So, what can you do? Well, all of the legal issues mentioned here can be overcome if the scraping is authorised by the relevant website or product owner. Can you build a relationship with the website or product owner by demonstrating you can add value to their business in some way?

If, like many of our clients you don’t want to be bothered with carefully analysing the legal rights and wrongs of your particular process then the take-home message should be this: although the law is always a step behind technological developments, the sentiment of the legal cases to date in a number of jurisdictions is that the website owner should be able to prevent scrapers from harvesting information without authorisation.

Courts appear willing to mould existing laws to find a legal wrong committed by the scraper. Extracting information from well-resourced companies, extracting sensitive or proprietary information, or using the information in a way which adversely impacts on the company’s bottom line, interferes with its bandwidth usage or weakens its control over its brand and marketing will all put you squarely in the firing line.

The legal rules in this area will only get firmer and tighter. Monitor legal developments. Put in place a Plan B and a rapid-response action plan to adopt Plan B as and when the time comes!

Averill Dickson is a senior lawyer at Simmonds Stewart, a boutique technology law firm providing corporate and commercial legal services focused on the New Zealand technology sector. With almost 20 years in the technology sector, Averill has extensive experience advising on the corporate and commercial aspects of technology businesses and transactions. Find out more at Simmonds Stewart, or follow Averill on Twitter @averilldickson. Simmonds Stewart has free online templates for tech companies, and a blog on IT legal issues.

Send news tips and comments to

Follow Divina Paredes on Twitter: @divinap

Follow CIO New Zealand on Twitter:@cio_nz

Sign up for CIO newsletters for regular updates on CIO news, views and events.

Join us on Facebook.

Copyright © 2015 IDG Communications, Inc.

Security vs. innovation: IT's trickiest balancing act