The greatest theft in history? -- How big tech is benefiting from our data and why we should care [Q&A]


If you've written a book or composed some music or engaged in any other form of creative endeavour, the chances are that AI will have used it as part of its learning process.
This appropriation of copyright material is controversial since the original creator doesn't benefit.
We talked to Jamie Dobson, founder of Container Solutions and author of The Cloud Native Attitude, to discuss what he's calling 'the greatest theft in history'.
BN: How are tech companies using data to train AI?
JD: Most of the AI systems in the news or flying through your social media streams are systems built around Artificial Neural Networks (ANN). ANNs, before they are useful, are trained on data, such as text or images. That training process leaves the 'neurons' in the network 'weighted' and, together, they’ll process incoming data in the exact way that humans process incoming data from the world helping you to spot obstacles as you cross the street.
Once the ANN is trained, as you were trained on pictures of cats by your mum or dad, the network is ready and the data no longer needed. The evidence, that’s to say the data, can be thrown away and it’s almost impossible to work out from the ANN which data it trained on. Just as it is impossible for me, 48 years after it happened, to tell you which pictures of cats I was trained on as a child.
That's how companies use data to train ANN. If that data is bought, is proprietary or used with permission, fine. If it is stolen, then that is, of course, the opposite of fine.
BN: Why is this such a problem?
JD: Theft is a crime. That's why it's a problem. When it comes to training ANNs, however, it's a crime that is hard to detect. Take a look at the arches under a train line or station. Arches are built with a temporary wooden structure called a falsework. When the arch is finished, the falsework vanishes. It is lost to history just as footprints on the beach are lost to the tides. The training data for ANNs is falsework. The ANN is the arch.
If those creating the falsework, the books, the movies, the poems, the artwork, that ANNs are trained on are not paid at all, production will cease. ANNs will stagnate and we will all suffer. However, if those creatives are not paid fairly, then the gains of their work will continue to be appropriated by massive companies who have got all the engineers and all the models. Does anybody think that such companies, without any government oversight, should have such power or that this will end well for the public?
BN: What do you think of the UK government's plans to allow tech companies to legally use copyrighted content for AI training?
JD: It's flawed. It makes it too hard for creatives to opt out. The government is weighing up the impact AI might have on society and considering sacrificing fairness on the alter of progress. There is precedent for this. During the first and second world war, patents were breached and new technologies created through patent theft. It made sense. Given the potential destruction of the world and the way of life for countries like the UK and the USA, who would care about a few patents? After the war ended, the patent courts sorted it all out.
Does this moment in history warrant such a great theft? No. It can be done differently. Creatives can get paid and we need to pay them so they keep creating. Governments can create the legal frameworks. And more than anything, they must make sure the gains of all this data and the ANNs they are used to train benefit us all and not just a handful of American tech giants.
BN: Why should Joe Average care about what's happening?
JD: Because Joe Average has children who are creative, because Joe average pays for the education of creatives through their taxes and they pay for the public infrastructure that funds the internet. We should no more turn our data over to foreign companies than we should give them our oil or top scientists. People are not stupid and soon the losses of such assets will be felt in their cost of living.
BN: Clearly AI is here to stay, what can we do to protect individual and corporate data rights?
JD: The solutions to this challenge need to be as innovative as the technology causing it. Here are several approaches we should consider:
- Data Rights and Compensation: Establish a framework where data creators receive compensation for their contributions to AI training. This could work in a similar way as it does with musicians who receive royalties when their songs are played.
- Algorithmic Transparency: Require AI companies to maintain and disclose training data sources, making it possible for creators to track and verify the use of their work.
- Public AI Infrastructure: Develop public alternatives to private AI models, ensuring that the benefits of this technology aren't concentrated in corporate hands.
- Progressive AI Taxation: Implement a scaled taxation system for AI companies based on their data usage and market impact, funding public services and potentially a Universal Basic Income.
- Digital Commons Framework: Create a new category of digital rights that balances innovation with fair compensation, perhaps through a system of micropayments or credit attribution.
Image credit: Fernando Gregory/Dreamstime.com