Meta may have torrented over 80 terabytes of pirated books to train its AI models


Just how AI models should be trained has been a subject of debate for some time now, with there being a lot of focus in whether publicly posted social media content is ripe for the picking or not. Now a new lawsuit suggests that Meta has been using pirated ebooks as a data source.
Emails that are serving as evidence in a copyright case against Meta appear to show that the Facebook owner has torrented scores of terabytes of data from a number of online resources. Among the places mentioned in newly released unredacted emails are Anna’s Archive, Z-Library and LibGen.
Cloudflare introduces AI Audit to help websites manage AI access and content usage


Cloudflare has introduced AI Audit, a new set of tools aimed at helping websites manage how artificial intelligence (AI) models access and use their content. AI Audit allows content creators to see how their content is being used by AI models and take steps to control access. Additionally, Cloudflare is working on a pricing feature that will enable creators to set a price for AI companies using their content for model training and retrieval augmented generation (RAG).
Many website owners may not be aware that AI bots are scanning their content frequently, often without the creator’s knowledge or compensation. AI Audit is designed to give control back to content owners, allowing them to block AI bots, access analytics on content usage, and negotiate agreements for the use of their content by AI models.