2-minute video generated by text from phenaki

There has been great hype in multimodal research these past few days because of the release of multiple text-to-video models while text-to-image models are still pretty hype and not perfect yet. Diffusion models such as GLIDE in general and stable diffusion in particular are the state-of-the-art models for text-to-image tasks as of now. However, multiple papers came out recently that tackle the task of generating videos from text prompts or still images. Here are some of these models released…

I'm a Ph.D. student interested in Artificial Intelligence, Machine Learning and intelligence in its abstract form