Pinokio

Store

Feed
Latest
Tag:#aix
Related tags
OpenVoice
https://github.com/cocktailpeanutlabs/openvoicev1updated 1/11/2026, 10:29:55 AMindexed 1/29/2026, 1:29:07 AM
Instantly clone any voice from any text to any speech, in any language https://huggingface.co/spaces/myshell-ai/OpenVoice
PhotoMaker2
https://github.com/pinokiofactory/photomaker2v3.7updated 1/10/2026, 9:15:05 PMindexed 1/20/2026, 9:13:57 AM
Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2
ComfyUI
https://github.com/cocktailpeanut/comfyui.pinokioupdated 1/7/2026, 3:19:06 AMindexed 1/23/2026, 7:46:52 PM
Stable Diffusion & Stable Video Diffusion GUI
Forge
https://github.com/pinokiofactory/stable-diffusion-webui-forgev2.0updated 1/7/2026, 1:28:44 AMindexed 1/20/2026, 9:14:53 AM
[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge
flux-webui
https://github.com/pinokiofactory/flux-webuiv3.7updated 1/6/2026, 2:59:02 PMindexed 1/20/2026, 9:14:55 AM
Minimal Flux Web UI powered by Gradio & Diffusers (Flux Schnell + Flux Merged)
LivePortrait
https://github.com/pinokiofactory/liveportraitv3.7updated 1/5/2026, 2:02:09 AMindexed 1/20/2026, 9:10:52 AM
Bring portraits to life! https://github.com/KwaiVGI/LivePortrait
StyleTTS2 Studio
https://github.com/pinokiofactory/StyleTTS2_Studiov3.7updated 1/4/2026, 5:07:11 AMindexed 1/23/2026, 7:46:14 PM
Build your own voice for StyleTTS2
cogvideo
https://github.com/pinokiofactory/cogvideov3.7updated 1/4/2026, 1:51:01 AMindexed 1/20/2026, 9:10:43 AM
[NVIDIA ONLY] Generate videos with less than 10GB VRAM https://github.com/THUDM/CogVideo
CogStudio
https://github.com/pinokiofactory/cogstudiov3.7updated 1/4/2026, 12:47:22 AMindexed 1/20/2026, 9:12:50 AM
[NVIDIA ONLY] Advanced Web UI for CogVideo (text to video, image to video, video to video, extend video, etc) -- Generate videos with less than 10GB VRAM
OpenAudio
https://github.com/pinokiofactory/openaudiov3.7updated 1/3/2026, 1:47:18 PMindexed 1/27/2026, 9:08:41 AM
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech
Clarity Refiners UI
https://github.com/pinokiofactory/clarity-refiners-uiv3.7updated 1/2/2026, 8:07:27 PMindexed 1/23/2026, 7:46:41 PM
An enhanced local port of finegrain-image-enhancer powered by Refiners (https://huggingface.co/spaces/finegrain/finegrain-image-enhancer), which was adapted from philz1337x's Clarity Upscaler (https://github.com/philz1337x/clarity-upscaler)
Hunyuan3D-2-LowVRAM
https://github.com/pinokiofactory/Hunyuan3d-2-lowvramv3.7updated 12/27/2025, 8:44:51 PMindexed 1/20/2026, 9:14:34 AM
Text/Image to 3D (Cross Platform: Mac + Windows + Linux): High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models. https://github.com/deepbeepmeep/Hunyuan3D-2GP
VibeVoice Realtime
https://github.com/pinokiofactory/vibevoice-realtimev5.0updated 12/22/2025, 10:00:08 PMindexed 1/20/2026, 9:13:58 AM
Realtime streaming TTS demo using microsoft/VibeVoice-Realtime-0.5B
Applio
https://github.com/pinokiofactory/appliov3.7updated 12/19/2025, 4:34:35 AMindexed 1/23/2026, 7:46:17 PM
A simple, high-quality voice conversion tool focused on ease of use and performance.
FramePack
https://github.com/pinokiofactory/Frame-Packv3.7updated 12/18/2025, 10:04:25 AMindexed 1/23/2026, 7:46:30 PM
[NVIDIA ONLY] Generate Video Progressively. FramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. https://github.com/lllyasviel/FramePack
RVC
https://github.com/cocktailpeanut/rvc.pinokiov3.7updated 12/11/2025, 2:33:17 PMindexed 1/23/2026, 7:44:52 PM
1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
Dia
https://github.com/pinokiofactory/diav3.7updated 12/7/2025, 7:54:59 PMindexed 1/20/2026, 9:12:46 AM
Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. https://github.com/nari-labs/dia
zonos
https://github.com/pinokiofactory/zonosv3.7updated 12/6/2025, 10:44:22 PMindexed 1/20/2026, 9:11:12 AM
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
bolt.diy
https://github.com/pinokiofactory/boltv3.4.0updated 12/6/2025, 9:59:32 PMindexed 1/20/2026, 9:13:03 AM
Prompt, run, edit, and deploy full-stack web apps. https://github.com/stackblitz-labs/bolt.diy
echomimic2
https://github.com/pinokiofactory/echomimic2v3.7updated 12/6/2025, 5:47:56 AMindexed 1/20/2026, 9:14:28 AM
[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2
PreviousPage 2 / 6Next