Meta may have torrented over 80 terabytes of pirated books to train its AI models
![](/wp-content/themes/betanews/images/authors/Sofia_smallthumb.png)
![Meta AI](https://betanews.com/wp-content/uploads/2024/09/Meta-AI-640x384.jpg)
Just how AI models should be trained has been a subject of debate for some time now, with there being a lot of focus in whether publicly posted social media content is ripe for the picking or not. Now a new lawsuit suggests that Meta has been using pirated ebooks as a data source.
Emails that are serving as evidence in a copyright case against Meta appear to show that the Facebook owner has torrented scores of terabytes of data from a number of online resources. Among the places mentioned in newly released unredacted emails are Anna’s Archive, Z-Library and LibGen.
See also:
- TikTok creators can easily share their content to YouTube thanks to new Repurpose.io partnership
- Google is the latest tech firm to drop diversity hiring targets following Trump’s executive orders
- Microsoft makes massive MIDI improvements for musicians using Windows 11
- Microsoft is getting serious about discouraging people from installing Windows 11 on unsupported hardware
The sheer scale of the alleged piracy is astonishing -- tens of millions of books may have been involved in the training of Meta’s AI models. The company has already admitted to torrenting some content, but further details are now becoming clear since the release of unredacted internal emails.
In a court filing shared by Ars Technica, the plaintiffs say:
Meta produced over 2,000 documents, including some of the most damning evidence to date: admissions from employees that LibGen is a pirated database and instructions to use it anyway, and admissions that Meta torrented tens of millions of pirated works despite knowing it was illegal.
Meta is further accused of attempting to cover its tracks, and emails show concern being voiced about “using Meta IP addresses 'to load through torrents pirate content’”. Despite Meta research engineer Nikolay Bashlykov warning that “using torrents would entail ‘seeding’ the files -- i.e., sharing the content outside, this could be legally not OK”, Meta seemingly continues to torrent and seed content while concealing the fact.
There is evidence from as recently as April 2024 indicating that the activity was still going on despite concerns having been expressed on numerous occasions.
As Ars Technica points out, there is now the question of how much knowledge Mark Zuckerberg has about what was happening. With the training of AI models now hugely significant, the result of this case could set several precedents.
Image credit: Cttpnetwork / Dreamstime.com