Is synthetic data the solution to data privacy challenges?


Synthetic data is artificial material that was not generated by natural life events. As such, it can be created by computer programs and AI tools that use different techniques, with generative adversarial networks and diffusion models being among the most popular and effective today. Synthetic data may come in many forms, but images and textual information are currently the most feasible options.
If you are interested in AI and ML developments, you have probably heard the term already -- “sanitized” synthetic data is a recent hype in the AI training field that, it is believed, might solve pressing data privacy and ownership challenges posed by real data. However, it all sounds like sunshine and rainbows only until you stop and consider the fact that AI algorithms used to generate synthetic data still need to be trained on real data -- the very obstacle they offer to remove.