Pinokio

Pierre Bruno

@pierrunoyt
3 posts21 checkpointsJoined 1/27/2026, 9:46:34 AM
Creations by @pierrunoyt
50 total
TranscribrUpdated 7 hours ago
https://github.com/PierrunoYT/Transcribr
Bulk transcribe many YouTube videos, whole playlists, or your own uploaded audio/video files at once with faster-whisper. Outputs txt, srt, vtt, or json.
ScribeTubeUpdated 8 hours ago
https://github.com/PierrunoYT/ScribeTube
Download and transcribe many YouTube videos or whole playlists at once with faster-whisper. Outputs txt, srt, vtt, or json.
MOSS-TTSUpdated 18 hours ago
https://github.com/PierrunoYT/MossTTS-Pinokio
All-in-one Gradio UI for the MOSS-TTS Family: voice cloning, dialogue generation, voice design from text, and sound effects.
Ideogram 4 StudioUpdated 2 days ago
https://github.com/PierrunoYT/Ideogram-4-Pinokio
Ideogram 4 (nf4) open-weights text-to-image model (9.3B params, Qwen3-VL-8B text encoder, structured JSON prompting, native 2k resolution)
PRX PixelUpdated 3 days ago
https://github.com/PierrunoYT/PRX-Pixel-Pinokio
Pixel-space PRX text-to-image pipeline (~7B params, Qwen3-VL text encoder, no VAE)
OmniVoice StudioUpdated 6 days ago
https://github.com/PierrunoYT/OmniVoice-Studio-Pinokio
The open-source ElevenLabs alternative. Local voice cloning, video dubbing, and real-time dictation — 646 languages, no API keys.
Higgs Audio v3 TTSUpdated last week
https://github.com/PierrunoYT/HiggsAudioV3-Pinokio
Pinokio launcher for Higgs Audio v3 TTS with Gradio UI, SGLang-Omni backend, and automatic model download.
dots.tts-baseUpdated last week
https://github.com/PierrunoYT/dots.tts-Pinokio
2B-parameter fully continuous, end-to-end autoregressive text-to-speech with zero-shot voice cloning. https://huggingface.co/rednote-hilab/dots.tts-base
VidLingoUpdated 2 weeks ago
https://github.com/PierrunoYT/VidLingo-Pinokio
YouTube to MP3, Cohere transcription, TranslateGemma translation, OmniVoice TTS. https://github.com/PierrunoYT/VidLingo-Pinokio
DramaBoxUpdated 3 weeks ago
https://github.com/PierrunoYT/DramaBox-TTS-Pinokio
Expressive TTS with voice cloning, prompt-driven speech synthesis built on LTX-2.3 by Resemble AI
RealRestorerUpdated 3 weeks ago
https://github.com/PierrunoYT/RealRestorer-Pinokio
Generalizable real-world image restoration (diffusers + Gradio). CUDA recommended; first run downloads HF weights.
Voxtral UIUpdated 3 weeks ago
https://github.com/PierrunoYT/Voxtral-UI-Pinokio
Run Mistral AI's Voxtral locally with a Gradio web interface (Transformers backend, no vLLM required).
SmolLM3-3B ChatbotUpdated 3 weeks ago
https://github.com/PierrunoYT/SmolLM3-3B-Pinokio
Advanced 3B parameter language model with Gradio web interface, GPU acceleration, and complete privacy
LFM2.5-350M Reader + Q&AUpdated 3 weeks ago
https://github.com/PierrunoYT/LFM2.5-350M-Pinokio
Paste long text, clean it into readable sections, summarize each section, and ask questions in-browser with WebGPU.
Higgs Audio V2 EnhancedUpdated 3 weeks ago
https://github.com/PierrunoYT/Higgs-Audio-V2-Pinokio
Advanced text-to-speech with voice cloning, multi-speaker support, and background music generation using Higgs Audio V2
TranslateGemmaUpdated 3 weeks ago
https://github.com/PierrunoYT/TranslateGemma-Pinokio
🌍 TranslateGemma - Google's open-source multilingual translation AI. Translate text across 55+ languages and extract/translate text from images. Powered by Gemma 3 architecture.
Youtube2MP3Updated 3 weeks ago
https://github.com/PierrunoYT/Youtube2MP3-Pinokio
🎵 YouTube to MP3 downloader with a simple Gradio UI. Paste a YouTube link to download MP3. Requires ffmpeg installed on your system.
PRX-1024 Text-to-ImageUpdated 3 weeks ago
https://github.com/PierrunoYT/Photoroom-PRX-Pinokio
Gradio web interface for Photoroom's PRX-1024-t2i-beta text-to-image model
VyvoTTS LFM2Updated 3 weeks ago
https://github.com/PierrunoYT/VyvoTTS-LFM2-Pinokio
High-quality Text-to-Speech powered by VyvoTTS LFM2 model with easy-to-use web interface
Moondream3 Gradio UIUpdated 3 weeks ago
https://github.com/PierrunoYT/moondream-3-pinokio
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.