Current location:Fastvermark > Knowledge
Knowledge

Meta Defends Using Pirated Books for AI, Saying They Have No 'Economic Value'

Meta has chosen a novel defense in its copyright case, saying there’s nothing wrong with using pirated books, since “none of [the authors’] works has economic value.”

Meta is facing a lawsuit for how it trained its AI models, with plaintiffs accusing the company of pirating copyrighted books to use as training material. To make matters worse for the company, internal emails confirm that Meta torrented more than 80 TB of copyrighted books.

“However it is done, torrenting pirated works is flagrantly illegal,” the plaintiffs wrote in their complaint.” And the magnitude of Meta’s unlawful torrenting scheme is astonishing: just last spring, Meta torrented at least 81.7 terabytes of data across multiple shadow libraries through the site Anna’s Archive, including at least 35.7 terabytes of data from Z-Library and LibGen. Pritt Decl., Ex. H.7 Meta also previously torrented 80.6 terabytes of data from LibGen (Sci-Mag).”

Pirated Books Have No ‘Economic Value’

According to Vanity Fair, Meta’s defense is to claim the pirated books have no real value. While the company “has invested hundreds of millions of dollars in LLM development,” they see no reason to pay the authors since “for there to be a market, there must be something of value to exchange, but none of Plaintiffs works has economic value, individually, as training data.”

The defense is a shocking admission from the company, one that is not likely to help the company, either in court in the the court of public opinion.

Why It Matters

AI firms have repeatedly earned the ire of authors, artists, content creators, and the industry at large by hoovering up vast quantities of data for use in training AI models. To make matters worse, many companies have been accused of ignoring websites’ robots.txt files, continuing to hammer websites with requests and scraping their data.

Thompson Reuters recently won a copyright case against Ross Intelligence after the latter used the former’s copyrighted material for AI training. OpenAI, Anthropic, and others are also locked in copyright cases with various organizations, including The New York Times.

The stakes couldn’t possibly be higher for the AI industry, with OpenAI proposing recommendations to the Trump administration that include a much looser interpretation of current copyright law, even raising the possibility of losing the AI race to China as motivation.

A copyright strategy that promotes the freedom to learn: America’s robust, balanced intellectual property system has long been key to our global leadership on innovation. We propose a copyright strategy that would extend the system’s role into the Intelligence Age by protecting the rights and interests of content creators while also protecting America’s AI leadership and national security. The federal government can both secure Americans’ freedom to learn from AI, and avoid forfeiting our AI lead to the PRC by preserving American AI models’ ability to learn from copyrighted material.

What makes Meta’s case different than some of the others is the extent to which the company blatantly torrented the books in question, with some executives even questioning the ethics and legality of doing so.

Melanie Kambadur stated on a message chain, “I don’t think we should use pirated material. I really need to draw a line there.” The four messages that follow are redacted.

Joelle Pineau responds to Eleonora Presani’s statement that “using pirated material should be beyond our ethical threshold.” Ms. Pineau then asks, “You think it’s problematic to use even for this phase?” followed by a redacted sentence. Presani then says “SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they’re infringing it.”

This document appears to be notes from a January 2023 meeting that Mark Zuckerberg attended. It is heavily redacted, including a large section titled “Legal Escalations.” Immediately after that section the document states “[Zuckerberg] wants to move this stuff forward,” and “we need to find a way to unblock all this.”

Nikolay Bashlylov suggested that Meta conceal its downloading of LibGen data using a VPN (“Can we load libgen data using Meta IP ranges? Or should we use some vpn?”). All three bullet points that follow are redacted.

In an internal message, Nikolay Bashlykov expresses concern about using Meta IP addresses “to load through torrents pirate content,” and says, “torrenting from a corporate laptop doesn’t feel right :).” A response from David Esiobu is redacted.

Conclusion

Meta’s admission in court is merely the latest evidence of what many have long claimed, namely that many AI firms simply don’t care about the ethical and legal questions involved in hoovering up data for AI models.

Many experts even believe companies are intentionally pushing forward, counting on the courts to be too slow to reach a consensus on the issue before AI becomes so critical and entrenched in everyday life that it becomes impractical to reign in or stop AI firms.

Given the stakes involved, and Meta’s unusual defense, the case could well define the AI industry, either normalizing what has been occurring or forcing companies to finally pay for the content they use.

Label:cctv足球直播;phoenix游戏
Share to:

You may be interested

近期必看

FastvermarkAPP Client

Friendly Links

Brother website

Fastvermark-all rights reserved

Copyright www.bixoro.com by ccooc;Actively filing... sitemap

0.1827s , 7620.234375kb