Pinokio

Store

Related tags
A multi-voice AI audiobook generator built on Qwen3-TTS — annotate scripts with an LLM, assign unique voices to each character, per-line style instructions for delivery, clone voices from reference audio, design new voices from text descriptions, train custom voices with LoRA fine-tuning, and export to MP3 or Audacity multi-track projects
Check-ins4 check-ins
Platforms
GPUNVIDIAAMDApple
[NVIDIA ONLY] AllTalk-TTS is a unified UI for E5-TTS, XTTS, Vite TTS, Piper TTS, Parler TTS and RVC, based on CoquiTTS, including a finetune mode.
Local-first AI audiobook production with voice cloning and chapter repair tools. This is the easiest way to install locally, including an optional demo voice library so you can start exploring right away. Live demo: senigami.github.io/audiobook-studio
Check-ins6 check-ins
Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning
All in one Gradio interface for chatterbox. Voice cloning from uploaded audio samples, automatic text processing for long content and real-time speech generation with configurable parameters. (Minimum Requirements 4GB VRAM / Recommended Requirements 8GB VRAM)
ComfyuiFeatured
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI
Check-ins41 check-ins
Platforms
GPUNVIDIAAMDApple
DiaFeatured
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
e2-f5-ttsFeatured
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS
Local multimodal app powered by Liquid AI LFM2.5-Audio-1.5B and LFM2.5-VL-1.6B models, delivering real-time voice chat, text-to-speech synthesis, long-form audio transcription, and multi-image vision reasoning.
Gradio-based web interface for the LuxTTS voice cloning and text-to-speech model, enabling users to generate customized speech from text using uploaded or recorded audio references with adjustable parameters like speed, guidance scale, and inference steps.
MeloTTSFeatured
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean https://github.com/myshell-ai/MeloTTS
OpenAudioFeatured
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech
OpenVoiceFeatured
Instantly clone any voice from any text to any speech, in any language https://huggingface.co/spaces/myshell-ai/OpenVoice
Openvoice2Featured
Openvoice 2 Web UI - A local web UI for Openvoice2, a multilingual voice cloning TTS https://x.com/myshell_ai/status/1783161876052066793
Orpheus TTS is an open-source text-to-speech system built on the Llama-3b backbone. Orpheus demonstrates the emergent capabilities of using LLMs for speech synthesis https://github.com/canopyai/Orpheus-TTS
parler-ttsFeatured
a lightweight text-to-speech (TTS) model that can generate high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). https://huggingface.co/spaces/parler-tts/parler_tts_mini
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
Check-ins12 check-ins
GPUNVIDIAAMDApple
Qwen3-TTSFeatured
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team
Check-ins17 check-ins
Platforms
GPUNVIDIAAMDApple
High-quality text-to-speech with Beautiful Web UI & API, optimized for Apple Silicon using MLX. Features include Custom Voice (preset speakers), Voice Design (natural language), and Voice Cloning. With enhanced features for saving custom voices and long-form / endless TTS streaming.
Check-ins55 check-ins
Platforms
GPUNVIDIAAMDApple
Build your own voice for StyleTTS2