Copyright battles hit Meta’s AI efforts hard

Meta is facing claims in a US court that it used pirated books to train its AI models.

Authors, including Ta-Nehisi Coates and Sarah Silverman, allege that Mark Zuckerberg approved the use of the LibGen dataset—a well-known archive of pirated books—despite warnings from Meta’s AI team.

Internal Meta messages included in the filing show that the AI team raised concerns about using LibGen - the smoking gun in the case.

They warned it could harm the company’s standing with regulators, saying, “Media coverage suggesting we have used a dataset we know to be pirated… may undermine our negotiating position.”

However, the filing claims the dataset was still approved after being reviewed by Zuckerberg.

In brief:

Meta’s AI team flagged risks of using LibGen, but the dataset was allegedly approved for training.
The claimants argue their books were used without permission to train Meta’s AI chatbot model, Llama.
The case adds to ongoing disputes about using copyrighted content to develop AI tools.

The Wild West of AI training

The lawsuit, filed in 2023, accuses Meta of training Llama, its large language model, on copyrighted books without consent.

LibGen, also called Library Genesis, is a “shadow library” that offers millions of books and articles.

In 2022, a New York court ordered its operators to pay $30 million (£24 million) in damages for copyright infringement.

The issue of using copyrighted material in AI training has become a growing legal and ethical concern.

Creators and publishers argue that such practices harm their livelihoods and violate intellectual property rights.

Zuck really said, 'Let’s just torrent it.'

Copyright battles hit Meta’s AI efforts hard

META

Copyright battles hit Meta’s AI efforts hard

The Wild West of AI training

Keep Reading

Mindstream