Store
Explore tags
Moondream3 Gradio UI
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
chatterbox-tts-api
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
AudioGradio
One click installer for AudioCraft MusicGen and AudioGen Gradio UI (Requires at least Pinokio v0.0.56)
IndexTTS-2
Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech application

ComfyUI Image to 3D
ComfyUI with TRELLIS2, GeometryPack, and UniRig custom nodes for image-to-3D generation
Chatterbox-TTS-Server
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
PhotoMaker2
Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2
FramePack
[NVIDIA ONLY] Generate Video Progressively. FramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. https://github.com/lllyasviel/FramePack
manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
Audio Flamingo 3
NVIDIA's Audio Flamingo 3 - Large Audio-Language Model for speech, sound, and music understanding with Gradio web interface
Umo
Multi-Identity Consistency for Image Customization via Matching Reward https://github.com/bytedance/UMO