Explore tags
halloFeatured
[NVIDIA Only] Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation https://github.com/fudan-generative-vision/hallo
LFM2-Audio-1.5B is Liquid AI's first end-to-end audio foundation model. Designed with low latency and real time conversation in mind
Pinokio script for https://huggingface.co/Ole1/Joy_Caption_Batch-GUI
Gradio-based web interface for the LuxTTS voice cloning and text-to-speech model, enabling users to generate customized speech from text using uploaded or recorded audio references with adjustable parameters like speed, guidance scale, and inference steps.
A tool that takes a text document containing a book or a novel, ingests it with an LLM to produce an annotated script, and then uses a TTS API to generate the voice lines, finally stitching them together into an audiobook in MP3 format.
OneTrainer para Pinokio vato loco
Imposing Consistent Light - Control lighting of images
Fast AI Video Generation per GPU poor (Wan2.1, Hunyuan, LTV). Gradio UI su http://127.0.0.1:7860
Check-ins42 check-ins
GPUNVIDIAAMDApple
RVCFeatured
1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
Check-ins10 check-ins
Realtime streaming TTS demo using microsoft/VibeVoice-Realtime-0.5B
Check-ins3 check-ins
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion WebUI (based on Gradio) to make development easier, optimize resource management, and speed up inference. https://github.com/Panchovix/stable-diffusion-webui-reForge
An open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible (function call) plugin system. https://github.com/lobehub/lobe-chat
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
OpenAudioFeatured
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech
One-click installer for Microsoft TRELLIS.2: High-quality 3D asset generation from images with PBR textures.
Google's official AI agent for your terminal. Access Gemini 2.5 Pro with 1M token context window directly from the command line.
One-click installer for Microsoft TRELLIS.2: High-quality 3D asset generation from images with PBR textures.
Owner@death
Check-ins4 check-ins
[NVIDIA ONLY] Super Optimized Gradio UI for Wan2.1 video generation for GPU poor machines (5GB+ VRAM). Generate up to 12 sec videos https://github.com/deepbeepmeep/Wan2GP
Check-ins3 check-ins