Cloudflare Blames Perplexity Of Stealth Data Scraping

Recently, Cloudflare and Perplexity came at odds recently as the former alleged Perplexity of stealth data scraping. Cloudflare observed Perplexity bots to crawl websites even with explicit no-crawl requests. Perplexity, however, denies such claims.

Cloudflare Alleges Perplexity Of Stealth Data Scraping

In a recent post, Cloudflare claimed to have observed Perplexity aggressively scraping data from websites in a stealth manner. By stealth, Cloudflare refers to Perplexity’s web crawling and data scraping even with sites disallowing such crawls.

Specifically, Cloudflare became suspicious of this activity when several customers complained to them about Perplexity crawlers crawling their websites even when disallowed. Cloudflare then tested this behavior by creating websites with dummy domains and querying Perplexity about the domains. Despite implementing all measures to block Perplexity crawlers on those sites, the responses from Perplexity to their queries about the sites hinted otherwise.

We conducted an experiment by querying Perplexity AI with questions about these domains, and discovered Perplexity was still providing detailed information regarding the exact content hosted on each of these restricted domains. This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their crawlers.

When websites like to block crawlers (such as Perplexity crawlers, in this case), they specifically add such rules in the robots.txt files. However, from Cloudflare’s experiment, it eventually turned out that the Perplexity crawlers use to bypass the robots.txt files and allowlists for crawlers.

Although Perplexity’s website clarifies that one of its crawlers Perplexity-User may ignore the robots.txt rules following user actions. Since this crawler supports user actions with Perplexity, it only accesses a website following a user request and isn’t used for web crawling or data scraping. But Cloudflare found the service doing more than what’s stated.

Cloudflare observed Perplexity to even use undeclared crawlers, using a generic browser mimicking Google Chrome for macOS, to access the content upon detecting a block.

To compare usual practices, Cloudflare even observed OpenAI’s ChatGPT and found it complying with the best practices for bot operations. Even their ChatGPT-User crawler also stops when it finds a disallowed directive.

Perplexity Refutes Cloudflare’s Statements

Following this disclosure from Cloudflare, Perplexity denied the claims. According to their statement to TechCrunch, Perplexity spokesperson Jesse Dwyer dubbed Cloudflare’s blog a “sales pitch” (since Cloudflare has announced strengthening its WAF rules to block Perplexity crawlers for the websites that disallow them). Besides, Dwyer even denied any link with the bot mentioned in Cloudflare’s post.

While it’s yet ambiguous if Perplexity is implementing stealth data scraping, its usual data scraping activities are also not as loved by the website owners. Recently, a Japanese newspaper, Yomiuri Shimbun, filed a lawsuit against Perplexity AI in the Tokyo District Court, alleging them they of “free-riding” on their data and copyright infringement. The newspaper seeks $14.7 million in damages, citing the utilization of 120,000 articles by Perplexity between June 2023 and July 2025. A decision on it is yet to arrive, which might also set a precedent about how these AI services could use the information available online.

Let us know your thoughts in the comments.

Source link

Cloudflare Alleges Perplexity Of Stealth Data Scraping

Perplexity Refutes Cloudflare’s Statements

Related Posts