Facebook parent company Meta released a set of papers last fall in which the company erased big chunks from photographs or videos — then used artificial intelligence to fill in the gaps.
This required the AI models to have some type of common sense understand of how the world works, Meta’s chief AI scientist, Yann LeCun told IEEE Spectrum late last month.
“If it can predict what’s going to happen in a video, it has to understand that the world is three-dimensional, that some objects are inanimate and don’t move by themselves, that other objects are animate,” he said in the interview.
The same AI system can also be used to fill in gaps in audio files, the researcher said.
Another Meta researcher, Christoph Feichtenhofer, told IEEE Spectrum that the technology can reduce the computational cost of video by 95 percent.
The technology is an example of what is known as unsupervised learning. After trying to fill in the gaps in the blacked-out photograph, the AI can look at the original image to check its own work and automatically adjust the model to be more effective next time.
By comparison, supervised learning requires humans in the loop to check the accuracy of the system or to carefully label training data.
On the far left in the image above, we have the photograph with parts erased. On the far right, there’s the original, complete photograph. The picture in the center is the AI’s best guess of what the full photograph would look like.
Meta is currently working on building a virtual reality metaverse, where an understanding of how the world works could be helpful with automatically generating a virtual environment.