One of the mysteries of Internet law is the legality of “web scraping.”
Here are two examples: (1) Clearview AI has scraped billions of publicly available images from social media platforms and compiled them into a facial recognition database that it’s made available to law-enforcement and private industry. (2) hiQ Labs has scraped publicly available information from profiles on LinkedIn and used it to analyze and predict employees’ likelihood of seeking other employment.
Both of these companies engaged in web scraping without permission. Did either of them violate any laws? This post will examine how the courts have treated data scraping in the hiQ/LinkedIn case.
Web scraping is machine-automated browsing that accesses and records the same information which a human visitor to the site might do manually. Typically this is performed by an Internet bot, or simply “bot,” a software program that runs automated scripts over the Internet.
Internet scraping is often done without the permission of the people and companies that post information on websites. One law that has been used to challenge scraping, often with success, is the Computer Fraud and Abuse Act, 18 U.S. Code §1030, or the “CFAA.”
The Computer Fraud and Abuse Act
The CFAA – the federal anti-hacking law – imposes civil and criminal liability for certain acts of computer trespass. The hiQ/LinkedIn case focused on the CFAA’s without authorization provision. This section of the law imposes liability on ”[w]hoever … intentionally accesses a computer without authorization … and thereby obtains … information ….”
The CFAA applies to any computer connected to the Internet. Therefore, the CFAA may be violated when someone accesses a website “without authorization.” However, the words “without authorization” are undefined, leaving it to the courts to decide how they should be applied.
Courts have held that if a scraper has agreed to contractual terms and conditions that bar scraping, it may have acted without authorization and therefore violated the CFAA.
To contact Lee Gesmer fill out the form below.
But what if the website is public facing – that is, it makes information available without the use of a password – and the site owner demands that it stop? Are the scraper’s actions now “without authorization”? That was the issue the Ninth Circuit recently decided in hiQ Labs, Inc. v. LinkedIn Corp. (9th Cir. April 18, 2022).
hiQ Labs v. LinkedIn Corp.
hiQ, a corporate data analytics company, uses scraping to collect information that LinkedIn users share on their public profiles. LinkedIn.com is a public facing website whose users own the information they provide to LinkedIn. LinkedIn demanded that hiQ stop scraping its site, asserting that it violated the CFAA. After receiving this demand hiQ filed suit asserting tortious interference, seeking a declaratory judgment that LinkedIn could not lawfully invoke the CFAA to stop it from scraping.
hiQ won before the district court – the court issued a preliminary injunction ordering LinkedIn to withdraw its cease-and-desist letter.
LinkedIn appealed, leading to a decision by the Ninth Circuit, a Supreme Court appeal and remand (remanded in light of Van Buren, below, without opinion), and a second decision by the Ninth Circuit. At all times the central question was whether, once LinkedIn demanded that hiQ cease scraping the site, any further scraping of LinkedIn’s data was “without authorization” in violation of the CFAA.
The Ninth Circuit upheld the preliminary injunction in hiQ’s favor, stating –
the CFAA’s prohibition on accessing a computer “without authorization” is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA. The data hiQ seeks to access … has not been demarcated by LinkedIn as private using … an authorization system. hiQ has therefore raised serious questions about whether LinkedIn may invoke the CFAA to preempt hiQ’s possibly meritorious tortious interference claim.
The court referenced the “gates-up-or-down” inquiry that the Supreme Court established in Van Buren v. United States (USSC 2021), which involved the “exceeds authorized access” prong of the CFAA (not at issue in LinkedIn):
In other words, applying the ‘gates’ analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place. Van Buren therefore reinforces our conclusion that the concept of ‘without authorization’ does not apply to public websites.
Based on this reasoning the Ninth Circuit refused to dissolve the preliminary injunction against LinkedIn, sending the case back to the district court for further proceedings.
What does this decision mean for data aggregators who use bots to “scrape” information from public facing websites? With some qualifications it is a win for both non-profit researchers and for-profit companies like hiQ and Clearview AI, who seek to scrape and exploit data commercially. Under this decision a public facing, “gates up” website cannot use the CFAA to demand that a scraper stop.
However, there are limits. For example, aggregators need to be careful not to copy expression that may be protected by copyright.
The ruling creates an incentive for websites to shield information behind a log-in page and terms and conditions barring data scraping.
State law causes of action, particularly common law trespass to chattels, is an undeveloped but potentially viable theory for websites seeking to block scraping.
Lastly, other circuits may disagree with the conclusion reached in this decision. The day may come when the Supreme Court decides the legality of web scraping of public facing data under the CFAA. LinkedIn v. hiQ is ongoing, and perhaps this very case will end up back before the Supreme Court.
Click below to read more about Lee Gesmer
Check out some of our latest publications.
- The Unique Challenges of Cannabis M&A Deals
- Angel Investor Series – Blair Heavey
- Venture Capital Stories – LearnLaunch Fund Plus Accelerator
- Angel Investor Series – Anton Khinchuk
- The Impact of ChatGPT on Startup Founders
- Angel Investor Series – Brett Reed
- When Does a Trademark Expire?
- Client Success Stories – Mojaloop Foundation
- Angel Investor Series – Alden Zecha