Pinokio

Store

Feed
Latest
Tag:#ttsx
Related tags
TTS-Story
https://github.com/Xerophayze/TTS-Storyv2.0updated 1/28/2026, 2:51:35 PMindexed 1/28/2026, 5:00:23 PM
Multi-Voice Text-to-Speech for Stories and Audiobooks. Supports Kokoro and Chatterbox TTS engines with GPU acceleration.
Wan2GP
https://github.com/pinokiofactory/wanv3.7updated 1/28/2026, 9:41:31 AMindexed 1/28/2026, 4:16:21 PM
Super Optimized Gradio UI for AI video creation for GPU poor machines (6GB+ VRAM). Supports Wan 2.1/2.2, Qwen, Hunyuan Video, LTX Video and Flux. https://github.com/deepbeepmeep/Wan2GP
Qwen3-TTS
https://github.com/SUP3RMASS1VE/Qwen3-TTS-Pinokiov5.0updated 1/27/2026, 5:41:21 PMindexed 1/27/2026, 6:02:41 PM
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
LuxTTS Studio
https://github.com/TheAwaken1/LuxTTS-Studiov2.0updated 1/26/2026, 7:53:06 PMindexed 1/26/2026, 11:39:48 PM
Gradio-based web interface for the LuxTTS voice cloning and text-to-speech model, enabling users to generate customized speech from text using uploaded or recorded audio references with adjustable parameters like speed, guidance scale, and inference steps.
Qwen3-TTS
https://github.com/Xeronal81/Qwen3-TTS-Pinokiov5.0updated 1/26/2026, 1:59:41 PMindexed 1/26/2026, 2:00:12 PM
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
Orpheus-TTS-FastAPI
https://github.com/pinokiofactory/Orpheus-TTS-FastAPIv3.7updated 1/24/2026, 11:02:12 PMindexed 1/24/2026, 11:41:38 PM
Orpheus TTS is an open-source text-to-speech system built on the Llama-3b backbone. Orpheus demonstrates the emergent capabilities of using LLMs for speech synthesis https://github.com/canopyai/Orpheus-TTS
LiquidAI-LFM2.5 Playground
https://github.com/TheAwaken1/LiquidAI-LFM2.5-Playgroundv2.0updated 1/24/2026, 3:15:55 PMindexed 1/24/2026, 3:16:01 PM
Local multimodal app powered by Liquid AI LFM2.5-Audio-1.5B and LFM2.5-VL-1.6B models, delivering real-time voice chat, text-to-speech synthesis, long-form audio transcription, and multi-image vision reasoning.
e2-f5-tts
https://github.com/pinokiofactory/e2-f5-ttsv3.7updated 1/23/2026, 9:14:27 PMindexed 1/27/2026, 1:36:05 PM
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS
Chattered
https://github.com/6Morpheus6/Chatteredv3.7updated 1/22/2026, 2:40:08 AMindexed 1/23/2026, 7:44:38 PM
All in one Gradio interface for chatterbox. Voice cloning from uploaded audio samples, automatic text processing for long content and real-time speech generation with configurable parameters. (Minimum Requirements 4GB VRAM / Recommended Requirements 8GB VRAM)
Ultimate-TTS-Studio
https://github.com/pinokiofactory/Ultimate-TTS-Studiov3.7updated 1/21/2026, 4:57:23 PMindexed 1/25/2026, 9:28:47 PM
Kokoro, KittenTTS, Higgs audio, Chatterbox/Multi, Fish-Speech, F5 & index-tts & indextts2, VoxCPM and VibeVoice in one app
Whisper-WebUI
https://github.com/pinokiofactory/whisper-webuiv3.7updated 1/20/2026, 11:36:49 PMindexed 1/23/2026, 7:45:51 PM
A Web UI for easy subtitle using whisper model.
Comfyui
https://github.com/pinokiofactory/comfyv3.7updated 1/14/2026, 11:37:40 AMindexed 1/24/2026, 11:52:22 PM
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI
AllTalk-TTS v2
https://github.com/6Morpheus6/alltalk-ttsv3.3updated 1/5/2026, 6:52:06 PMindexed 1/23/2026, 7:45:17 PM
[NVIDIA ONLY] AllTalk-TTS is a unified UI for E5-TTS, XTTS, Vite TTS, Piper TTS, Parler TTS and RVC, based on CoquiTTS, including a finetune mode.
StyleTTS2 Studio
https://github.com/pinokiofactory/StyleTTS2_Studiov3.7updated 1/4/2026, 5:07:11 AMindexed 1/23/2026, 7:46:14 PM
Build your own voice for StyleTTS2
OpenAudio
https://github.com/pinokiofactory/openaudiov3.7updated 1/3/2026, 1:47:18 PMindexed 1/27/2026, 9:08:41 AM
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech
VibeVoice Realtime
https://github.com/pinokiofactory/vibevoice-realtimev5.0updated 12/22/2025, 10:00:08 PMindexed 1/20/2026, 9:13:58 AM
Realtime streaming TTS demo using microsoft/VibeVoice-Realtime-0.5B
Dia
https://github.com/pinokiofactory/diav3.7updated 12/7/2025, 7:54:59 PMindexed 1/20/2026, 9:12:46 AM
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
zonos
https://github.com/pinokiofactory/zonosv3.7updated 12/6/2025, 10:44:22 PMindexed 1/20/2026, 9:11:12 AM
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Bark Voice Cloning
https://github.com/6Morpheus6/barkv3.7updated 11/30/2025, 4:33:53 AMindexed 1/22/2026, 2:20:35 AM
Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning
XTTS
https://github.com/cocktailpeanut/xtts.pinokiov3.0updated 11/10/2025, 4:28:58 AMindexed 1/23/2026, 7:47:04 PM
clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)
PreviousPage 1 / 2Next