Project updates
Latest Master Branch
Fork from https://github.com/facefusion/facefusion-pinokio Which always installs / Updates to the latest Mast...

X-Voice The universal translator
X-Voice is a voice clone app that lets you clone voices in any language. The Zero-Shot Voice Cloning tab work...

help from devs to improve FluxRT
I’ve completed a full installer for FluxRT and have already integrated several new modes into the UI, includi...

Fooocus2026 — Pinokio launcher available now
A one-click way to install my fork of Fooocus 2.5.5, with the quality-of-life additions I kept missing in ups...
Store
XTTSFeatured
clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)
Run the Open-Hivemind multi-agent orchestrator locally with Pinokio.
YouTube to MP3, Cohere transcription, TranslateGemma translation, OmniVoice TTS. https://github.com/PierrunoYT/VidLingo-Pinokio
Local-first AI audiobook production with voice cloning and chapter repair tools. This is the easiest way to install locally, including an optional demo voice library so you can start exploring right away. Live demo: senigami.github.io/audiobook-studio
YouTube to MP3, Cohere transcription, TranslateGemma translation.
[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge
🎨 FLUX.2 [klein] - Fast text-to-image generation with Black Forest Labs' FLUX.2 models. 6 variants available: 4B/9B (full precision) plus NVFP4/FP8 quantized versions. Consumer GPUs (~13GB) to high-end (~29GB) for sub-second image generation with outstanding quality.
Liquid Audio - LFM2.5-Audio-1.5B: speech-to-speech, ASR, and TTS powered by Liquid AI.
Ultra-lightweight text-to-speech (15M-80M params) — CPU optimized, 8 voices, ONNX-powered
Instant, Ultra-Realistic Text-to-Speech
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
High-quality Text-to-Speech powered by VyvoTTS LFM2 model with easy-to-use web interface
Gradio web interface for Photoroom's PRX-1024-t2i-beta text-to-image model
🎵 YouTube to MP3 downloader with a simple Gradio UI. Paste a YouTube link to download MP3. Requires ffmpeg installed on your system.
🌍 TranslateGemma - Google's open-source multilingual translation AI. Translate text across 55+ languages and extract/translate text from images. Powered by Gemma 3 architecture.
Advanced text-to-speech with voice cloning, multi-speaker support, and background music generation using Higgs Audio V2
Advanced 3B parameter language model with Gradio web interface, GPU acceleration, and complete privacy
State-of-the-art open-source speech recognition model supporting 14 languages. 2B parameter ASR model from Cohere Labs.
Paste long text, clean it into readable sections, summarize each section, and ask questions in-browser with WebGPU.
🎙️ Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning. High-quality text-to-speech synthesis supporting zero-shot voice cloning and streaming inference with natural emotional expression.
