Is Web Scraping Legal?

published on 26 December 2022

Learn everything there is to know about the legality of web scraping, including a few precedent legal cases.

Is web scraping legal?
Is web scraping legal?

Assuming you're here on this page, it's safe to say that you've heard of web scraping and are interested in finding out whether or not web scraping is legal. If so, you've come to the right place!

In this blog post, we'll take a close look at the legality of web scraping and key considerations you must factor into making your decision on whether or not to perform web scraping.

We'll also highlight a few precedent legal cases on web scraping.

Whether you're just starting out with web scraping or you're an experienced web scraping practitioner looking for additional insight into the legality of web scraping, keep reading to learn everything there is to know on whether or not web scraping is legal.

What is Web Scraping?

Web scraping is a technique used to automatically extract data from websites.

Web scraping makes requests to a website's server and downloads the web page code in HTML (HyperText Markup Language), which is the standard, open format for rendering web page code.

You can then extract data from the HTML code using several techniques, such as XPath (XML Path Language), CSS (Cascading Style Sheets), regular expressions (RegEx) or specialized libraries.

Web scraping can be a powerful method for collecting large amounts of data that would otherwise be time-consuming or difficult to obtain manually.

Web scraping is often used by researchers, businesses and individuals to gather data from multiple sources for various purposes, such as market research, price tracking and data analysis.

However, web scraping can be complex and time-consuming.

Some websites attempt to prevent web scraping by blocking IP addresses that make too many requests.

Also, lots of websites use CAPTCHA tests to determine whether a human or bot is making the web requests.

Additionally, websites can change their structure and layout, which can break the web scraping code used to extract data.

Using a web scraping service can be more efficient than building and maintaining your own web scraping solution.

A web scraping service can handle the technical details of making HTTP requests, downloading HTML and extracting data, enabling you to focus on growing your business.

A web scraping service can also address issues arising from structural changes to the source websites or IP blocking.

A top-notch web scraping service will perform data cleansing, standardization, enrichment and other data transformations, to ensure the data that you extract is in the optimal state for your use.

A great web scraping service can perform big data engineering on large volumes of data to process such data efficiently and quickly, making such a web scraping service a cost-efficient solution for extracting and using large amounts of data.

With increasing data gravity to and use of the cloud, the best web scraping services will also integrate your data to any cloud platform, such as AWS, Google Cloud, Azure, Snowflake or Databricks.

Web scraping can be a valuable method for harvesting data from the web, but the use of web scraping requires careful planning and execution.

Reasons to Perform Web Scraping

Web scraping is a powerful approach for gathering data from websites.

There are many reasons why people and companies choose to perform web scraping, such as:

  1. To gather large amounts of data from multiple sources: Web scraping can be used to automate the data extraction from multiple websites quickly and efficiently.
  2. To track prices and monitor competitors: Web scraping can be useful for businesses looking to track prices and monitor competitors in the e-commerce and travel industries, for example. Web scraping can enable businesses to stay up-to-date on market trends and identify opportunities for differentiation.
  3. To extract data from social media platforms: Web scraping can be used to extract data from social media platforms for social media analytics. Web scraping can be useful for businesses looking to track and analyze social media sentiment and other activity, in order to more effectively convert prospects and satisfy customers.
  4. To gather leads for sales and marketing: Web scraping can be used to gather contact information and other data on potential leads for sales and marketing purposes. Web scraping can enable businesses to identify and generate data on potential customers and clients.

See Web Scraping Use Cases for more examples of how you can use web scraping to grow your business.

Is scraping data from the web legal?
Is scraping data from the web legal?

Is Web Scraping Legal?

The short answer is yes. Web scraping is legal.

However, there are some limitations and potential legal issues to be aware of.

It is generally acceptable to perform web scraping on publicly-available data, as long as you adhere to the specific terms of use and policies of the source websites.

Web scraping is typically only illegal if you perform web scraping to gain unauthorized access to someone else's data or to engage in malicious activity.

As long as you are only web scraping publicly-available data and following the policies of the websites from which you are scraping, web scraping is considered to be legal.

Publicly-Available Data & Web Scraping

Publicly-available data is data that is open for use by the general public and is typically not protected by intellectual property laws.

A great example of such publicly-available data is government databases that are published on government websites.

It is generally considered legal to access and use publicly-available data, as long as you are not using the data for malicious or illegal purposes.

However, it is important to respect any terms of use or policies that may be in place for accessing and using such publicly-available data, as well as any applicable laws and regulations.

Web Scraping and the Computer Fraud and Abuse Act (CFAA)

The Computer Fraud and Abuse Act (CFAA) is a US federal law that prohibits unauthorized access to computer systems and the misuse of information obtained from computer systems.

The CFAA applies to both individuals and organizations, and the act provides for criminal and civil penalties for violations.

The CFAA covers a wide range of activities, including hacking, identity theft and the unauthorized access of computer systems or data. The act also prohibits the use of computer systems to commit fraud or other crimes.

The CFAA has been used in a number of high-profile cases involving computer-related crimes and has been the subject of debate and legal challenges regarding its scope and application.

Web Scraping Legal Cases: Five Examples

Let's take a quick look at five legal cases on web scraping to provide insight into how the courts have interpreted the laws related to web scraping.

It is important to note that the cases do not represent a complete list of legal cases on web scraping, and that the legal landscape surrounding web scraping is constantly evolving.

In 2017, LinkedIn sued HiQ Labs for web scraping publicly-available data.
In 2017, LinkedIn sued HiQ Labs for web scraping publicly-available data.

#1. LinkedIn vs HiQ Labs

The case of LinkedIn vs HiQ Labs was a legal dispute between LinkedIn and HiQ Labs, a company that provides data analytics services to businesses.

In 2017, LinkedIn filed a lawsuit against HiQ Labs. In the lawsuit, LinkedIn argued that HiQ Labs' use of web scraping to gather data from LinkedIn's public profiles was a violation of the Computer Fraud and Abuse Act (CFAA) and LinkedIn sought to block HiQ Labs from accessing LinkedIn's data.

HiQ Labs argued that its use of web scraping was legal and that LinkedIn's efforts to block it were anti-competitive.

The US District Court for the Northern District of California granted a preliminary injunction in favor of HiQ Labs in 2017, preventing LinkedIn from denying HiQ Labs from accessing publicly-available LinkedIn data.

In September 2019, the US Court of Appeals for the Ninth Circuit affirmed the lower District Court's decision.

The US Court of Appeals ruled in favor of HiQ Labs, stating that LinkedIn's efforts to block HiQ Labs' access to its data were not justified under the CFAA and that HiQ Labs' use of web scraping did not violate the CFAA.

On April 18 2022, the US Court of Appeals reaffirmed its original decision that web scraping data that is publicly-accessible is legal and does not violate the CFAA.

In November 2022, the US Court of Appeals for the Ninth Circuit ruled that HiQ had violated LinkedIn's User Agreement.

Consequently, LinkedIn and HiQ Labs reached a settlement agreement.

Assessment

In the LinkedIn vs HiQ Labs case, both the lower US District Court for the Northern District of California and the US Court of Appeals for the Ninth Circuit ruled that web scraping publicly-accessible data does not violate the Computer Fraud and Abuse Act (CFAA).

The court decisions reaffirm that scraping publicly-available data cannot be considered unauthorized under the CFAA.

However, HiQ was found liable on the basis of violating LinkedIn's User Agreement.

Therefore, the ruling of the court on the basis of the User Agreement and not the CFAA potentially sets a precedent in favor of social media companies and other online platforms making similar cases on the basis of the legally-binding agreements to which users of a platform sign up.

In October 2020, Facebook sued both BrandTotal and Unimania for web scraping data from Facebook and Instagram.
In October 2020, Facebook sued both BrandTotal and Unimania for web scraping data from Facebook and Instagram.

#2. Facebook vs BrandTotal and Unimania

On October 1, 2020, Facebook sued Israel-based BrandTotal and Delaware-incorporated Unimania for allegedly using web scraping to access and collect data from its platform without authorization.

Facebook claimed that the companies had violated its Terms of Service and engaged in unauthorized access to its servers.

In June 2022, the district court overseeing the case ruled that BrandTotal did not violate the CFAA.

On September 30, 2022, the case was ultimately settled out of court, with the companies agreeing to pay damages and to stop accessing Facebook's data through web scraping or other means.

Assessment

As with the LinkedIn vs HiQ Labs case, the court ruled in favor of BrandTotal that BrandTotal did not violate the CFAA by web scraping data from Facebook.

However, although Facebook and the two companies settled out of court, it's unclear what was the primary driver of the settlement.

Meta and Bright Data sued each other in January 2023 in cases that may set precedents for the future of web scraping.
Meta and Bright Data sued each other in January 2023 in cases that may set precedents for the future of web scraping.

#3. Meta vs Bright Data and Bright Data vs Meta

"The collection of data from websites can serve legitimate integrity and commercial purposes, if done lawfully and in accordance with those websites' terms.”

Andy Stone, Meta Spokesman

The legal battle between Facebook and Instagram owner, Meta Platforms, and Israeli-based data collection company, Bright Data, revolves around the right of Bright Data to scrape data from Facebook and Instagram.

On January 6, 2023, Meta sued Bright Data in California, alleging that the data collection company scraped data from its websites, allowed others to do so and tried to sell the information, violating Meta's Terms of Service.

On January 20, 2023, Bright Data countered with a lawsuit against Meta in Delaware, claiming that the social media giant should not be able to restrict access to public data.

Bright Data emphasized the importance of public data for market competition and transparency; and vowed to defend everyone's right to access such public data.

In Bright Data's lawsuit, the company noted its compliance with US and EU regulations; and emphasized that Bright Data only collects public information that is not login-protected.

On February 2, 2023, Bloomberg published a story titled Meta Was Scraping Sites for Years While Fighting the Practice. In an ironic twist of events, the Bloomberg story reveals that Meta paid Bright Data to scrape data from websites.

According to the story, email correspondence showed that Meta had a long-standing professional relationship with Bright Data, while Meta was publicly condemning web scraping and suing companies that scraped data from Facebook and Instagram.

Meta ended its relationship with Bright Data, supposedly after learning that its arrangement with Bright Data violated Meta's company terms prohibiting the automated collection and selling of data.

Meta conceded that "The collection of data from websites can serve legitimate integrity and commercial purposes, if done lawfully and in accordance with those websites' terms," in a statement from Meta Spokesman, Andy Stone.

Assessment

The cases raise several complex questions about data ownership, the legality of scraping public data and the interpretation of laws such as the Computer Fraud and Abuse Act (CFAA).

The outcome of these lawsuits may set precedents for the future of web scraping, the accessibility of public data and how far social media companies can go enforcing their Terms of Service.

Twitter filed a lawsuit against four anonymous entities in July 2023 for allegedly scraping data from Twitter, in violation of the company's Terms of Service.
Twitter filed a lawsuit against four anonymous entities in July 2023 for allegedly scraping data from Twitter, in violation of the company's Terms of Service.

#4. Twitter vs Four Anonymous Entities

On July 6, 2023, Twitter filed a lawsuit in Texas against four anonymous entities over alleged web scraping activity, in violation of Twitter's Terms of Service.

Twitter has been unable to "unable to ascertain the identity" of the entities, according to the filing. The filing identifies the web scraping entities only by their IP addresses.

Twitter alleges that the entities worked with data processing facilities in Dallas County, Texas. The alleged location of the facilities in Dallas is likely the reason that Twitter filed the case in Texas.

Twitter is seeking monetary damages of more than $1 million.

Twitter's lawsuit comes on the back of changes that the company's Executive Chairman and CTO Elon Musk announced earlier in July 2023. He says that the changes are to prevent web scrapers from accessing Twitter's data.

The changes include daily limits on the number of tweets that users can view per day.

For verified accounts, the limit was 6,000 posts per day, which later increased to 8,000 posts per day and then finally increased to 10,000 posts per day.

For unverified accounts, the limits were initially 600, then 800 and then finally 1,000 tweets per day.

For new unverified accounts, the initial limit was 300 tweets per day, which later increased to 400 and then finally increased to 500 tweets per day.

Furthermore, the company announced that it has limited access to tweets for users who are not signed in to Twitter.

"Several entities tried to scrape every tweet ever made in a short period of time," Musk tweeted. "That is why we had to put rate limits in place."

Assessment

In April 2022, the US Ninth Circuit of Appeals reaffirmed its original decision that web scraping of publicly-accessible data is not a violation of the Computer Fraud and Abuse Act (CFAA), the act that governs what kind of activity constitutes hacking.

However, it appears that Twitter is not hinging its case on the CFAA, given that Twitter's case appears to focus on the "unjust enrichment" of the web scraping entities.

Twitter might be hinging its case on the 2020 ruling by the US Fifth Circuit of Appeals that data scraping could be considered "unjust enrichment".

Additionally, as with the LinkedIn vs HiQ case, Twitter might be looking to build its case on the violation of Twitter's Terms of Service.

In July 2023, X Corp, formerly Twitter, filed a lawsuit against Bright Data allegedly for web scraping activity, in violation of Twitter's Terms of Service.
In July 2023, X Corp, formerly Twitter, filed a lawsuit against Bright Data allegedly for web scraping activity, in violation of Twitter's Terms of Service.

#5. X Corp, the company formerly known as Twitter, vs Bright Data

On July 26, 2023, X Corp, formerly Twitter, filed a lawsuit against data collection company Bright Data for allegedly scraping and selling Twitter content and user data without authorization.

Israel-based Bright Data builds web scraping tools that can automatically extract large volumes of public data, including data from publicly-available Twitter posts.

According to X Corp, Bright Data violates Twitter's Terms of Service by selling data scraped from Twitter to third parties.

X Corp also claims Bright Data induces other users to break Twitter's rules by providing scraping software that targets Twitter data specifically.

X Corp argues that Bright Data intentionally disregards Twitter's scraping ban because its executives have Twitter accounts bound by Twitter's Terms of Service.

In response to the lawsuit, Bright Data's CEO Or Lenchner said that Twitter's lawsuit "has no basis" and that Twitter is attempting to restrict access to public data.

Assessment

The US Ninth Circuit of Appeals' reaffirmation in April 2022 of its original decision that web scraping of publicly-accessible data does not violate the Computer Fraud and Abuse Act (CFAA) is applicable to the Twitter vs Bright Data Case.

However, Twitter is attempting to make its case based specifically on alleged abuse of Twitter's Terms of Service. Twitter claims that Bright Data violates the terms that prohibit web scraping and accessing non-public areas of Twitter's platform.

It remains to be seen whether X's case, based on alleged violation of the company's Terms of Service, will have merit in the courts.

Furthmermore, it is not apparent what exactly Twitter is referring to by making reference to "non-public areas" of Twitter's platform, if Bright Data is supposedly scraping publicly-available data.

On the other hand, Bright Data continues to make its case that public data should be broadly available to everyone. The company is taking a similar stance to the position the company has taken on its web scraping case with Meta.

Conclusion from Legal Cases

Based on the outcomes of the noted legal cases, web scraping is problematic only when you violate the terms of use or policies of the source website.

Otherwise, web scraping of publicly-accessible data is legal and does not violate the Computer Fraud and Abuse Act (CFAA).

Our Services & How We Can Help You

Partnering with the right web scraping service enables you to avoid legal issues that may arise from web scraping.

By working with a reputable web scraping service to extract data on your behalf, you can ensure that you perform your web scraping responsibly and in compliance with applicable laws and regulations.

Our web scraping service can enable you to quickly and efficiently extract the data you need from websites, while ensuring strict adherence to the terms of use and policies of source websites.

Our service scrapes, cleans and customizes the web data you need, saving you both time and energy.

Several leading organizations are using our cloud-based, AI-powered, industrial-grade web scraping service to get the data they need from websites.

Here are the three simple steps to start using our web scraping service today:

  1. Share your requirements with us.
  2. We extract your data quickly.
  3. We deliver your data in a timely and user-friendly format.

Conclusion

Web scraping is legal as long as you extract data that is publicly-available and you do not violate the terms of service or policies of the source websites.

Web scraping does not violate the Computer Fraud and Abuse Act (CFAA).

It is important that you respect the terms of use and policies of websites, as well as all applicable laws and regulations, when performing web scraping.

Now that you know web scraping is legal, you’re probably wondering how to get started.

We offer a fully-managed web scraping service so all you have to do is sit back and let us do the heavy lifting to extract the web data you need to grow your business.

Get started today to see what we can do for you!

Legal Disclaimer

Although WSaaS provides analysis of web scraping cases based on our expertise and experience, this blog post is strictly for informational purposes only.

This blog post does not constitute legal advice.

The laws regarding web scraping and their interpretations thereof are complex, nuanced and continue to evolve.

We strongly recommend consulting with professional legal counsel for specific guidance tailored to your situation, prior to collecting or using web data.

Read more