Store
Wan2GP - AMD
[AMD ONLY] Super Optimized Gradio UI for AI video creation for GPU poor machines (6GB+ VRAM). Supports Wan 2.1/2.2, Qwen, Hunyuan Video, LTX Video and Flux. (On Windows supported by 7900(XT), 7800(XT), 7600(XT), Phoenix, 9070(XT) and Strix Halo)
Moondream3 Gradio UI
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
MagicQuill
An intelligent, interactive Image Editing System. Easily erase and add objects on a user-friendly interface.
Audio Flamingo 3
NVIDIA's Audio Flamingo 3 - Large Audio-Language Model for speech, sound, and music understanding with Gradio web interface
MFLUX-WEBUI
[MAC ONLY] A powerful and user-friendly web interface for FLUX, powered by MLX and Gradio via MFLUX
GLM-TTS
🎙️ Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning. High-quality text-to-speech synthesis supporting zero-shot voice cloning and streaming inference with natural emotional expression.
Forge Neo
[NVIDIA ONLY] Stable Diffusion WebUI Forge supporting Flux, Qwen, wan, nunchaku and more in a lightweight WebUI. https://github.com/Haoming02/sd-webui-forge-classic/tree/neo
Ultimate-TTS-Studio-SUP3R-Edition
Kokoro, KittenTTS, Higgs audio, Chatterbox/Multi, Fish-Speech, F5 & index-tts & indextts2, VoxCPM and VibeVoice in one app
Chattered
All in one Gradio interface for chatterbox. Voice cloning from uploaded audio samples, automatic text processing for long content and real-time speech generation with configurable parameters. (Minimum Requirements 4GB VRAM / Recommended Requirements 8GB VRAM)
Dia
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
zonos
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
bolt.diy
Prompt, run, edit, and deploy full-stack web apps. https://github.com/stackblitz-labs/bolt.diy
echomimic2
[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2
DiffRhythm
Generate songs with AI (up to 4 min 45 sec). Both with lyrics or instrumental https://github.com/ASLP-lab/DiffRhythm
