Reddit sues Perplexity and others over alleged unlicensed scraping of Reddit content

Reddit sues Perplexity and others over alleged unlicensed scraping of Reddit content

Abstract illustration of web scraping and data flow

Reddit has filed a lawsuit against Perplexity, SerApi, OxyLabs and AWMProxy, alleging the companies scraped Reddit content from Google search results without a license and used or resold that material. Reddit is seeking monetary damages and a permanent injunction to stop the sale or reuse of previously scraped data.

Key allegations and evidence

  • End‑run around licensing: Since 2023, Reddit has charged for data/API access and signed paid deals with firms like Google and OpenAI. The suit claims defendants avoided those fees by scraping search result pages for Reddit posts.
  • Test post sting: Reddit says it published a “test post” that was only crawlable by Google. Within hours, content from that post appeared in Perplexity’s answers, which Reddit argues indicates scraping of Google’s indexed results.
  • Robots and policies: The complaint references alleged disregard for robots.txt directives and rapid ingestion of scraped material.

Perplexity’s response

Perplexity stated it had not yet received the lawsuit but would “fight vigorously for users’ rights to freely and fairly access public knowledge,” calling its approach principled and responsible. (All claims in Reddit’s filing remain allegations until adjudicated.)

Why this matters

  • Defines the lines on data use: A ruling could clarify whether scraping search results for user‑generated content (UGC) without a direct license exposes companies to liability.
  • Platform monetization: Reddit has tightened access to bots/crawlers (and even limited Wayback Machine access in 2025) as it pursues paid data licensing and builds its own AI features.
  • AI training stakes: The case arrives amid broader disputes over how AI companies source training data and whether robots.txt or new standards like Really Simple Licensing (RSL) should govern usage.

What to watch next

  • Defendants’ responses and any court decisions on an injunction.
  • Discovery around scraping methods, robots.txt compliance and whether/where the data was used (e.g., in training or answer engines).
  • Potential ripple effects on data deals between platforms, publishers and AI developers.

Sources and further reading:
Engadget summary ·
Initial reporting ·
Reddit

Discussion: Is scraping search results for community posts fair use—or should AI firms always license platform data before ingesting it?

Leave a Reply

Your email address will not be published. Required fields are marked *

Diese Seite verwendet Cookies, um die Nutzerfreundlichkeit zu verbessern. Mit der weiteren Verwendung stimmst du dem zu.

Datenschutzerklärung