Google introduced a video generator on the text based on Imagen

Google researchers have announced the development of the Imagen Video artificial intelligence system, capable of generating a video with a resolution of 1280 × 768 pixels and a frequency of 24 frames per second, according to verbal requests.

As the researchers of Google, Imagen Video explains a text description and creates a 16-frame video with a resolution of 24 × 48 pixels and a frequency of 3 FPS . Then the system scales and “predicts” additional images.

As a result, the algorithm generates a 128-cadre animation with a resolution of 1280 × 768 pixels and a frequency of 24 FPS.

The first stage of generation of Imagen Video video. Data: Google. Intermediate stage of generation video Imagen Video. Data: Google. Finished video generated by Imagen Video. Data: Google.

For learning Imagen Video, the developers used 14 million pairs “Video-description” and 60 million “Image-Text” The Bitcoin Community , as well as a publicly accessible set of Laion-400M data, which allowed the model to use a number of aesthetic aspects.

Generated video Imagen Video. Data: Google.

During testing, the researchers found that the algorithm can create “watercolor” videos or tolerate Van Gogh style. According to them, Imagen Video demonstrated an understanding of depth and three -dimensionality, which allowed him to generate a video, as if recorded by a drone.

Generated video Imagen Video. Data: Google.

The system also knows how to correctly display the text.

“Unlike Stable Diffusion and Dall-E 2, who are trying to turn a request like“ Diffusion logo ”into readable words, Imagen Video reproduces it without problems,” the project says.

According to the AI-Researcher from the University of Alberta Matthew Guzdial, the problem of transforming the text in the video has not yet been resolved.

“We are unlikely to soon achieve something like Dall-E 2 or Midjourney in quality [creation of rollers],” he said.

To remove trembling in the video and get rid of distortion, the Imagen Video team plans to join forces with Phenaki developers. This is another generator from Google, turning long detailed tips into two -minute low -quality videos.

Google also notes that the data used for training contained unacceptable content, which is why Imagen Video sometimes creates clips with an image of violence or sexual nature. Therefore, the company does not plan to release a model or its source code until the problem is corrected.

Recall that in September the enthusiast developed an animation generator in the text of Stable Diffusion Video.

In June, Chinese researchers have developed a Cogvideo transformer with 9 billion parameters to convert the text into animation.

