Explore tags
Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech application
ComfyUI with TRELLIS2, GeometryPack, and UniRig custom nodes for image-to-3D generation
Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2
Multi-Identity Consistency for Image Customization via Matching Reward https://github.com/bytedance/UMO
# SillyTavern Character Generator A pinokio script for https://github.com/Tremontaine/character-card-generator When used with KoboldCPP use http://localhost:5001/v1 Where 5001 is the port reported by KoboldCPP when starting Text API Key needs to be filled with anything. (If left empty will give a error so just add anything to it)
AI-powered speech denoising + enhancement (Gradio web demo + CLI).
Voice Synthesis Platform with Smart Chunking, Batch Processing, and Voice Cloning capabilities.
DiaFeatured
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
[v0.5.1] FramePack Video App offering multiple generation types: Original, F1, video extension, end frame. Features include: LoRA support, job queueing, advanced timestamped prompts, offline mode, a post-processing suite including upscaling, interpolation, filters and more!
A web interface for managing and interacting with Ollama models
A web interface for managing and interacting with Ollama models
zonosFeatured
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Automatically create music videos. Synchronize the cuts to the music's beat.
A Step Towards Music Generation Foundation Model
echomimic2Featured
[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2
[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2
pyramidflowFeatured
Pyramd Flow Video Generation AI (text-to-video & image-to-video) https://github.com/jy0205/Pyramid-Flow