Store
parler-tts
a lightweight text-to-speech (TTS) model that can generate high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). https://huggingface.co/spaces/parler-tts/parler_tts_mini
Openvoice2
Openvoice 2 Web UI - A local web UI for Openvoice2, a multilingual voice cloning TTS https://x.com/myshell_ai/status/1783161876052066793
ZeST
ZeST: Zero-Shot Material Transfer from a Single Image. Local port of https://huggingface.co/spaces/fffiloni/ZeST (Project: https://ttchengab.github.io/zest/)
StableAudio
An Open Source Model for Audio Samples and Sound Design https://github.com/Stability-AI/stable-audio-tools
flashdiffusion
Accelerating any conditional diffusion model for few steps image generation https://gojasper.github.io/flash-diffusion-project/
RC Stable Audio Tools
Advanced Gradio UI for Stable Audio https://github.com/RoyalCities/RC-stable-audio-tools
audiocraft_plus
AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top https://github.com/GrandaddyShmax/audiocraft_plus
moshi
[Mac only] a speech-text foundation model for real time dialogue https://github.com/kyutai-labs/moshi
MagicAnimate
[NVIDIA ONLY] Temporally Consistent Human Image Animation using Diffusion Model https://showlab.github.io/magicanimate/
UVR5-WebUI
The best vocal remover application on the internet, and it's totally free and open source!
Deepseek-ai-Janus
Janus Pro 7B is a powerful multimodal AI model designed for advanced image understanding and text-to-image generation.
RVC-realtime
[WINDOWS/LINUX ONLY] Easily train a good VC model with voice data <= 10 mins!: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Applio
A simple, high-quality voice conversion tool focused on ease of use and performance. https://github.com/IAHispano/Applio