Project updates
Latest Master Branch
Fork from https://github.com/facefusion/facefusion-pinokio Which always installs / Updates to the latest Mast...

X-Voice The universal translator
X-Voice is a voice clone app that lets you clone voices in any language. The Zero-Shot Voice Cloning tab work...

help from devs to improve FluxRT
I’ve completed a full installer for FluxRT and have already integrated several new modes into the UI, includi...

Fooocus2026 — Pinokio launcher available now
A one-click way to install my fork of Fooocus 2.5.5, with the quality-of-life additions I kept missing in ups...
Store
Pinokio WebUI for danielgatis' RemBG. RemBG is a tool to remove images background
Microsoft's 7B parameter computer use agent with Gradio interface
SoTA open-source TTS
Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech application
ComfyUI with TRELLIS2, GeometryPack, and UniRig custom nodes for image-to-3D generation
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS
Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2
Multi-Identity Consistency for Image Customization via Matching Reward https://github.com/bytedance/UMO
AI-powered speech denoising + enhancement (Gradio web demo + CLI).
Fast, high-quality image generation using comfyui via a Gradio UI
Voice Synthesis Platform with Smart Chunking, Batch Processing, and Voice Cloning capabilities.
DiaFeatured
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
A web interface for managing and interacting with Ollama models
A web interface for managing and interacting with Ollama models
zonosFeatured
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
Automatically create music videos. Synchronize the cuts to the music's beat.
A Step Towards Music Generation Foundation Model
