Generative Video

李宏毅_生成式導論 2024

李宏毅_生成式導論 2024_第17講:有關影像的生成式AI (上) — AI 如何產生圖片和影片 (Sora 背後可能用的原理)

李宏毅_生成式導論 2024_第18講:有關影像的生成式AI (下) — 快速導讀經典影像生成方法 (VAE, Flow, Diffusion, GAN) 以及與生成的影片互動


Open-VCLIP

Paper: Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Paper: Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Code: https://github.com/wengzejia1/Open-VCLIP/


Text-to-Video

Awesome Video Diffusion Models


AnimateDiff

Paper: AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Code: https://github.com/guoyww/AnimateDiff


Stable Diffusion Video

Paper: Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Code: https://github.com/nateraw/stable-diffusion-videos


Animate Anyone

Paper: Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation


StyleCrafter

Paper: StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Code: https://github.com/GongyeLiu/StyleCrafter


Sora

Paper: Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models


Outfit Anyone

Paper: OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Code: https://github.com/HumanAIGC/OutfitAnyone


LTX-Video

Paper: LTX-Video: Realtime Video Latent Diffusion


Open-Sora

Paper: Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

Code: https://github.com/hpcaitech/Open-Sora


Step-Video-TI2V

Paper: Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Code: https://github.com/stepfun-ai/Step-Video-TI2V


WAN 2.2

Paper: Wan: Open and Advanced Large-Scale Video Generative Models

Code: https://github.com/Wan-Video/Wan2.2

ComfyUI + WAN2.2

Multi-Talk

Paper: Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Code: https://github.com/MeiGen-AI/MultiTalk


InfiniteTalk

Paper: InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing

Code: https://github.com/MeiGen-AI/InfiniteTalk


Wan-S2V

HuggingFace: Wan-AI/Wan2.2-S2V-14B

Paper: Wan-S2V: Audio-Driven Cinematic Video Generation


UniVerse-1

Paper: UniVerse-1: Unified Audio-Video Generation via Stitching of Experts

Code: https://github.com/Dorniwang/UniVerse-1-code/


Wan-Animate

Paper: Wan-Animate: Unified Character Animation and Replacement with Holistic Replication


VEO 3

Paper: Video models are zero-shot learners and reasoners


FlashVSR

Paper: FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Code: https://github.com/OpenImagingLab/FlashVSR


TiDAR

TiDAR: Think in Diffusion, Talk in Autoregression


DiffusionVL

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models


SRENDER

Paper: Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering


LTX-2

Paper: LTX-2: Efficient Joint Audio-Visual Foundation Model


daVinci-MagiHuman

Paper: Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Code: https://github.com/GAIR-NLP/daVinci-MagiHuman


Runway

GWM-1

Gen-4.5


Kling

Paper: Kling-MotionControl Technical Report


SeedDance 2.0

Paper: Seedance 2.0: Advancing Video Generation for World Complexity


LTX-2.3

Blog: Lightricks 發布 LTX 2.3 開源影片生成模型,可在本地端製作 4K 50FPS 同步音訊 AI 影片


JoyAI-Echo

Code: https://github.com/jd-opensource/JoyAI-Echo

ComfyUI_JoyAI_Echo need 48GB VRAM!


World Models

Phantom

Paper: Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics


SANA-WM

Paper: SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Code: https://github.com/NVlabs/Sana



This site was last updated June 19, 2026.