Reddit sues Perplexity and three data‑scraping firms over alleged unlicensed use of Reddit content
Reddit has filed suit against Perplexity, SerApi, OxyLabs and AWMProxy, alleging the companies scraped Reddit content from Google search results without a license and used or resold that data. The complaint seeks financial damages and a permanent injunction to stop the sale or reuse of previously scraped Reddit material.
Key allegations and evidence
- End‑run around licensing: Since 2023, Reddit has charged for data/API access and signed paid deals with Google and OpenAI. The suit claims the defendants avoided those fees by scraping search result pages for Reddit posts.
- Test‑post sting: Reddit says it created a hidden “test post” that was only crawlable by Google. Within hours, content from that post appeared in Perplexity’s answers, which Reddit argues indicates scraping of Google’s indexed results.
- Robots and policies: The filing references alleged disregard for robots.txt directives and rapid ingestion of scraped material.
Perplexity’s response
Perplexity said it hadn’t yet received the lawsuit but would “fight vigorously for users’ rights to freely and fairly access public knowledge,” calling its approach principled and responsible. (All claims in Reddit’s filing remain allegations until adjudicated.)
Why this matters
- Lines on data use: A ruling could clarify whether scraping search results for user‑generated content (UGC) without a direct license exposes companies to liability.
- Platform monetization: Reddit has tightened bot/crawler access (and even limited Wayback Machine access in 2025) as it pursues paid data licensing and builds its own AI features.
- AI training stakes: The case lands amid broader disputes over how AI companies source training data, and whether robots.txt or new standards like Really Simple Licensing (RSL) should govern usage.
What to watch next
- Defendants’ responses and any court decision on an injunction.
- Discovery around scraping methods, robots.txt compliance and whether/where the data was used (e.g., in model training or answer engines).
- Potential ripple effects on future data deals between platforms, publishers and AI developers.
References:
Engadget summary ·
Reddit
Discussion: Is scraping search results for community posts fair use—or should AI firms always license platform data before ingesting it?