zonos
https://github.com/pinokiofactory/zonosv3.7updated 12/6/2025, 10:44:22 PMindexed 1/6/2026, 6:14:46 AM
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Dia
https://github.com/pinokiofactory/diav3.7updated 12/7/2025, 7:54:59 PMindexed 1/6/2026, 6:16:57 AM
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
e2-f5-tts
https://github.com/pinokiofactory/e2-f5-ttsv3.7updated 12/20/2025, 8:47:31 PMindexed 1/6/2026, 6:19:09 AM
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS
VibeVoice Realtime
https://github.com/pinokiofactory/vibevoice-realtimev5.0updated 12/22/2025, 10:00:08 PMindexed 1/6/2026, 6:18:30 AM
Realtime streaming TTS demo using microsoft/VibeVoice-Realtime-0.5B
OpenAudio
https://github.com/pinokiofactory/openaudiov3.7updated 1/3/2026, 1:47:14 PMindexed 1/6/2026, 6:16:52 AM
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech