Reddit has initiated legal proceedings against AI company Perplexity in New York federal court, citing unauthorised scraping and use of user-generated content from its platform for AI model training.

The lawsuit also names Oxylabs, AWMProxy, and SerpApi, alleging that these companies aided Perplexity’s data collection by concealing their identities and using techniques designed to imitate typical user behaviour.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

Perplexity, which has built an AI-driven search service, has rejected the allegations.

In a statement posted on Reddit, the AI company asserted it does not train models on the social media platform’s content but instead provides summaries and citations of public discussions. The company added that “it is ‘impossible’ to sign a licence agreement” for this reason.

The statement further read: “A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong arm tactics just isn’t how we do business.”

Perplexity characterised the lawsuit as “a show of force in Reddit’s training data negotiations with Google and OpenAI.”

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

SerpApi stated to CNBC that it “strongly disagrees” with Reddit’s claims and will defend itself in court. CNBC did not receive responses from Oxylabs or AWMProxy.

Reddit’s legal action is part of a wider industry conflict regarding the use of publicly available content in training large language models.

The company previously filed a similar suit against Anthropic in June 2025. According to the complaint, Perplexity increased its referencing of Reddit content by 40 times following receipt of a cease-and-desist letter from the latter.

Reddit claims posts from its platform are a frequent source for AI-generated responses on Perplexity’s service.

With more than 100,000 communities, Reddit is a major source of publicly available user conversations.

Researchers have previously noted that Reddit’s volume and moderation provide a valuable training dataset for generating more conversational AI outputs.

The social media company has pursued data licencing strategies with enterprises such as OpenAI and Alphabet, restricting AI-related access to its data to those who have paid for specific agreements.

Earlier in 2025, Reddit COO Jen Wong told Adweek that AI licencing deals with Google and OpenAI account for close to 10% of the firm’s revenue.

In a statement provided to CNBC, Reddit chief legal officer Ben Lee said: “AI companies are locked in an arms race for quality human content,” describing the process as fuelling an “industrial-scale ‘data laundering’ economy.”