Explore tags
MMAudioFeatured
Generate synchronized audio from video and/or text inputs https://github.com/hkchengrex/MMAudio
Build your own voice for StyleTTS2
bolt.diyFeatured
Prompt, run, edit, and deploy full-stack web apps. https://github.com/stackblitz-labs/bolt.diy
Open WebUIFeatured
User-friendly WebUI for LLMs, supported LLM runners include Ollama and OpenAI-compatible APIs https://github.com/open-webui/open-webui
Check-ins10 check-ins
Platforms
YuEFeatured
[NVIDIA ONLY] YuEGP--A Web UI for YuE, an Open Full-song Generation Foundation Model (10G VRAM required), via https://github.com/deepbeepmeep/YuEGP
browser-useFeatured
Run AI Agent in your browser. https://github.com/browser-use/web-ui
zonosFeatured
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
macOS-useFeatured
[Mac Only] We make AI agents that control Mac apps: https://github.com/browser-use/macOS-use
MatAnyoneFeatured
MatAnyone AI is a tool for editing videos by separating objects from their backgrounds. It is an AI to remove the background from videos effectively. Stable Video Matting with Consistent Memory Propagation: https://github.com/pq-yang/MatAnyone.git
DiffRhythmFeatured
Generate songs with AI (up to 4 min 45 sec). Both with lyrics or instrumental https://github.com/ASLP-lab/DiffRhythm
cubeFeatured
Roblox Foundation Model for 3D Intelligence --- Cross Platform (Mac, Windows, Linux): Requires 16GB+ VRAM PC or 18GB+ Memory Macs https://github.com/Roblox/cube
HunyuanVideoFeatured
[NVIDIA ONLY] Super Optimized Gradio UI for Hunyuan Video Generator that works on GPU poor machines. Generate up to 10~14 sec videos https://github.com/deepbeepmeep/HunyuanVideoGP
unoFeatured
[NVIDIA ONLY] Generate an image from multiple images https://github.com/bytedance/UNO
DiaFeatured
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
FramePackFeatured
[NVIDIA ONLY] Generate Video Progressively. FramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. https://github.com/lllyasviel/FramePack
Industry leading face manipulation platform
Check-ins66 check-ins
Platforms
GPUNVIDIAAMDApple
Kokoro, KittenTTS, Higgs audio, Chatterbox/Multi, Fish-Speech, F5 & index-tts & indextts2, VoxCPM and VibeVoice in one app
Check-ins19 check-ins
Platforms
GPUNVIDIAAMDApple
One-click 3D Gaussian Splatting generation from a single image.
Check-ins4 check-ins
Orpheus TTS is an open-source text-to-speech system built on the Llama-3b backbone. Orpheus demonstrates the emergent capabilities of using LLMs for speech synthesis https://github.com/canopyai/Orpheus-TTS
Qwen3-TTSFeatured
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
Check-ins16 check-ins
Platforms
GPUNVIDIAAMDApple