Go with perspective
Blogs & Publications

Scrapers, Robots and Spiders: The Battle Over Internet Data Mining

by Lee Gesmer

When American Airlines sued Farechase, Inc. in federal district court in Texas earlier this year, claiming that Farechase’s “screen-scraping” of AA’s flight information from AA.com was illegal, it was only the most recent in a series of cases challenging unauthorized data collection from Internet web sites. What practices are encompassed by “screen-scraping”? Is “scraping” really illegal? What does this line of cases mean for your business?

What is “Screen – Scraping”? Despite its pejorative title, screen- scraping software simply gathers and aggregates data from other Internet websites for use by the gathering party. Usually, the purpose is to reformat the data and display it for the benefit of the gathering party’s customers. Examples of data aggregation range from sites that collect prices on retail sites to companies that aggregate personal financial data on mutual fund and banking web sites, permitting registered users to access information about multiple accounts on a single web site.

The software that performs this function, often referred to as a robot (or “bot”), “spider” or “crawler”, automatically searches Internet web sites for specific information. The Farechase case provides one example of how this technology works. Farechase’s customers are travel agencies. When a customer uses Farechase to research a particular airline, hotel or auto rental fare, Farechase’s software will search different airline sites and collect the “webfares” offered. A popular site such as AA.com might be searched thousands of times a day in response to queries initiated by Farechase customers. Farechase’s real time search technology is an advance on more traditional data mining, in which companies search sites on a regular basis and maintain a separate database that may be queried by users. In the case of sites selling books or music, a real time search may not be essential, as long as the database is updated frequently. Farechase took this concept one step further by permitting its users to search for fares offered at the very moment the search is conducted, thus guaranteeing that the results would be current.

Needless to say, one’s views on this type of data mining depend largely on whether one is the scraper or scrapee. The targets of this practice, such as American Airlines, complain that the constant traffic resulting from scraping puts an extra burden on their Internet servers, slowing down their response times for legitimate users. In the Farechase case American Airlines claimed that if left unstopped, Farechase would be performing over 200,000 daily searches by the end of 2003. Moreover, it argued that by permitting customers to access web fares by going directly to the American booking pages at AA.com, American is unable to establish the relationship with its customers that would occur if customers were required to navigate through AA.com’s preliminary pages, thereby costing American customer good will. On the other hand, companies like Farechase argue that their service encourages comparison shopping, and that companies that resist it are afraid of the competition (and the lower prices) that result.

Technological Defenses. Before discussing the legalities of screen-scraping, it is worth pointing out that companies who are targeted by this practice and who object to it often undertake a measure of “self-help” before authorizing their lawyers to file suit. Such self-help sometimes leads to a technological battle worthy of a William Gibson novel. The defenders attempt to identify and block the Internet Protocol (IP) addresses of the attackers. The attackers respond by hiding or disguising their scrapers’ identities by using fake IP addresses, thereby evading the blocking firewalls. The attackers, not easily discouraged, seem to have a limitless supply of disguises, perpetuating this high tech cat-and-mouse game. In several cases the attackers have prevailed. As a result, several of these disputes have ended up in the courts.

Legal Defenses. When technical defenses fail, screen-scraper targets such as American Airlines have two primary legal weapons to deploy in their defense. The first is to claim breach of a click-wrap or browse-wrap on-line license. The second is to allege a “tort” (or legal wrong), most commonly “trespass to chattel.”

In its case against Farechase, American Airlines attempted to fire both barrels at its opponent, but its opening salvo was weak. First, American claimed that Farechase violated American’s “browsewrap” agreement. By its use of the term “browsewrap” American was referring to an online agreement which appears on the site (usually under the terms and conditions link), but does not require the user to click on or express consent to the agreement before proceeding to use the site. By contrast, the better known (and far more effective) “click-wrap” agreement requires the first-time user to click on a word or symbol to express acceptance of a site’s licensing terms before gaining access to the site. While the user of a properly implemented click-wrap agreement can expect enforcement, no court has yet enforced a browsewrap agreement, and the only two courts that have considered the issue at all have expressed doubts as to the enforceability of such an agreement. However, the ability to protect a web site with nothing more than an explicit statement on the website restricting access received a potential boost in a recent decision by the First Circuit Court of Appeals in Boston. That court suggested that screen-scraping may violate the Computer Fraud and Abuse Act (the “CFAA”), and that a restrictive warning of the sort used in browsewrap agreements may be enough to invoke the CFAA.

The second barrel of American’s gun was loaded with more powerful munitions, in the form of its claim that Farechase had violated the law of “trespass to chattels” (i.e. goods). While the English law of trespass as applied to chattels can be traced back hundreds of years, it has shown a surprising ability to adapt itself to the law of the Internet. Most courts that have considered the applicability of trespass law to data scrapers have ruled in favor of the complaining party. The best known of these cases, eBay, Inc. v. Bidder’s Edge, Inc., resulted in an injunction ordering Bidder’s Edge to stop data mining from the eBay website. Moreover, in several of these cases the courts have not required proof that the scrapers caused any measurable harm, or caused any specific injury, to the sites they were data mining.

Not surprisingly, based on the above record, American Airlines was successful in obtaining an injunction against Farechase. While Farechase is still in business, its searches no longer include American web fares.

TLB Comment: Based on this state of the law, can data miners expect to build a business based on unauthorized screen-scraping? Somewhat surprisingly, the outlook may be better than it appears. First, many companies do utilize this form of data mining without objection from the owners of the sites they are crawling. The reasons are economic, not legal. In some industries screen-scraping has become an accepted method of business. Further, the vast majority of companies are willing to provide access to their sites when they are approached cooperatively. The fact that some percentage of their capacity is being used by a scraper is not a deterrent, as long as the scraper’s customers ultimately are referred to the vendor’s site to make the purchase.

Second, while the law thus far has favored original content providers, the law on electronic trespass to chattels is far from settled. Just before this article went to press the California Supreme Court issued a decision in Intel v. Hamadi, rejecting Intel’s attempt to prevent a former employee from sending mass e-mails to Intel employees. In that case the court held that electronic trespass to chattels is not actionable under California law unless it involves “actual or threatened injury to the personal property or the possessor’s legally protected interest in the personal property.” Since Hamadi’s e-mails (numbering in the hundreds of thousands) to Intel employees caused no such harm, the court refused to order Hamadi to cease communications. Although this case was not a screen-scraping case, the issues implicated are essentially the same (Intel relied heavily on the scraper cases), and therefore Hamadi may be an important defensive tool for scrapers to use in the future.

Back to Technology Law Bulletin Summer 2003