Explore tags
Pinokio launcher wrapper for Claw Code with non-interactive setup and model/provider selection.
Pre-mastering & audio enhancement for AI-generated music. 12-stage processing chain with platform presets (Suno, Udio), before/after spectrogram, and broadcast-ready LUFS normalization.
🎵 YouTube to MP3 downloader with a simple Gradio UI. Paste a YouTube link to download MP3. Requires ffmpeg installed on your system.
High-quality rapid TTS voice cloning model (150x+ realtime) — 48kHz speech, voice cloning
State-of-the-art open-source speech recognition model supporting 14 languages. 2B parameter ASR model from Cohere Labs.
🎨 FLUX.2 [klein] - Fast text-to-image generation with Black Forest Labs' FLUX.2 models. 6 variants available: 4B/9B (full precision) plus NVFP4/FP8 quantized versions. Consumer GPUs (~13GB) to high-end (~29GB) for sub-second image generation with outstanding quality.
Fast Image Generation with Sana Diffusion Model
⚡️ Efficient 6B parameter image generation model with sub-second inference. Generate high-quality, photorealistic images with only 8 inference steps. Features bilingual text rendering (Chinese & English) and Single-Stream Diffusion Transformer architecture.
Kimodo: Kinematic Motion Diffusion Model. Generates high-quality 3D human and robot motions.
Instant, Ultra-Realistic Text-to-Speech
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
Open-source social media scheduling tool with AI. Schedule posts to X, LinkedIn, Reddit, Discord, Threads, TikTok, YouTube, Pinterest, Dribbble, Slack, Mastodon, Facebook, GitHub, and more.
flux-webuiFeatured
Minimal Flux Web UI powered by Gradio & Diffusers (Flux Schnell + Flux Merged)
Check-ins4 check-ins
NVIDIA's Audio Flamingo 3 - Large Audio-Language Model for speech, sound, and music understanding with Gradio web interface
Gradio web interface for Photoroom's PRX-1024-t2i-beta text-to-image model
High-quality Text-to-Speech powered by VyvoTTS LFM2 model with easy-to-use web interface
Hermes ModFeatured
A full Hermes skin manager for browsing, editing, saving, and activating CLI skins directly from Pinokio.
Check-ins5 check-ins
Next generation face swapper and enhancer
Check-ins18 check-ins
Real-time face swap and one-click video deepfake with only a single image.
Local-first audiobook production with cloned voices, chapter-based editing, segment regeneration, narrator + character casting, and final export in one web app. XTTS runs fully local by default. Voxtral is available as an optional cloud voice path with your own Mistral API key if you want another engine for specific voices. Voice profiles can use different engines in the same project, so you can mix narrator and character workflows without leaving Audiobook Studio. Built for iterative production: preview voices, regenerate only changed sections, queue chapter or segment work, and assemble the finished audiobook when everything is ready. Note: enabling Voxtral sends synthesis text and selected reference audio to Mistral. Learn more: https://senigami.github.io/audiobook-studio/
Check-ins3 check-ins