Creators are fighting back against AI giants

Recent events highlight the ongoing clash between AI companies and content creators over training data.

Google, a few months back, warned OpenAI not to use YouTube data for training models.

However, reports from The New York Times suggest OpenAI, Meta, and Google may not have followed these rules.

AI companies often say their models are trained on “publicly available data,” but the phrase isn’t always clear.

OpenAI’s former CTO, Mira Murati, avoided giving details about the data used for models like Sora.

She said it was “publicly available or licensed,” a response many find vague and concerning.

Ed Newton-Rex, a former Stability AI team lead, left the company over its stance that training on copyrighted material is “fair use.”

He believes creators lose out when AI models produce similar content.

Publishers like The New York Times now ban AI companies from using their content, but enforcing these rules is difficult without clear laws.

Here’s what you should know:

“Publicly available” doesn’t mean consent, AI companies often use this term, but it doesn’t mean creators agreed to their work being used.
Companies like Meta, Google, and OpenAI have been accused of using copyrighted material without proper licences.
Creators are suing AI companies, but current laws don’t fully address these issues yet.

With less online data available, AI companies are rushing to find enough material to train their models.

Some are taking risky shortcuts:

Meta’s past actions: Court records show that in 2016, Meta intercepted data from platforms like Snapchat, YouTube, and Amazon, gaining access to sensitive information like usernames and passwords.
Ignoring copyright risks: Meta, OpenAI, and Google have reportedly used copyrighted material without clearance to save time and stay competitive.

Many creators are turning to lawsuits to protect their work, but success is limited.

For instance, a federal judge recently dismissed most copyright claims in a case brought by authors like Ta-Nehisi Coates and Sarah Silverman.

Until stronger regulations are introduced, both creators and AI companies will face legal and ethical challenges.

The next few years are likely to bring key rulings and laws that shape how data is collected and how creators can protect their content.

This feels like the Wild West of data collection.