'Computer Vision' teams struggle with training data putting projects at risk
The field of Computer Vision -- which looks at how computers can understand digital images or videos -- is a relatively new one, but like any branch of AI it relies on data to train systems effectively.
Synthetic data specialist Datagen has released a new report looking at training data in CV projects and finds that it has become a significant stumbling block.
Particular issues include wasted time and/or resources caused by a need to retrain the system often (52 percent), poor annotation resulting in quality issues (48 percent), poor data coverage of the intended application’s domain (47 percent) and lack of sufficient data (44 percent).
These problems in turn can harm the progress of projects, 99 percent of respondents report having experienced project cancellations, 80 percent have experienced project delays lasting at least three months, while 33 percent have experienced project delays lasting seven months or more.
The report also shows a growing appetite for the use of synthetic data. The research reveals that 96 percent of computer vision teams report already using synthetic data in the training and testing of their models.
"Synthetic data is the future of data. This is the new way to control and consume the data our AI systems need," says Ofir Chakon, founder and CEO of Datagen. "As simulation gets better over time, with all its benefits, it will take over the place of labor-intensive manual data collection that is no longer scalable at the speed the world is evolving."
Advantages of using synthetic training data include, reduced time-to-production (cited by 40 percent), elimination of privacy concerns (46 percent), reduced bias (46 percent), fewer annotation and labeling errors (53 percent) and improvements in predictive modeling (56 percent).
Full results of the study, commissioned by Datagen and conducted by Wakefield Research, are available on the Datagen site.
Image credit: SergeyNivens/depositphotos.com