OmniSVG
[NVIDIA ONLY] End-to-end multimodal SVG generator capable of generating complex and detailed SVGs, from simple icons to intricate anime characters. (Minimum Requirements 12GB VRAM / 32GB RAM, Recommended Requirements 24GB VRAM / 24GB RAM)
Open WebUI
User-friendly WebUI for LLMs, supported LLM runners include Ollama and OpenAI-compatible APIs https://github.com/open-webui/open-webui
Orpheus-TTS-FastAPI
Orpheus TTS is an open-source text-to-speech system built on the Llama-3b backbone. Orpheus demonstrates the emergent capabilities of using LLMs for speech synthesis https://github.com/canopyai/Orpheus-TTS
Z-Image-Turbo
⚡️ Efficient 6B parameter image generation model with sub-second inference. Generate high-quality, photorealistic images with only 8 inference steps. Features bilingual text rendering (Chinese & English) and Single-Stream Diffusion Transformer architecture.
RVC
1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
kohya_ss
1 Click Installer for kohya_ss, a Stable Diffusion LoRa & Dreambooth WebUI (https://github.com/bmaltais/kohya_ss)
MagicAnimate
[NVIDIA ONLY] Temporally Consistent Human Image Animation using Diffusion Model https://showlab.github.io/magicanimate/
Realtime BakLLaVA
llama.cpp with BakLLaVA model describes what does it see (https://github.com/Fuzzy-Search/realtime-bakllava)
MLX-Video-Transcription
[Mac Only] Super Fast MLX Powered Video Transcription https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator/ by https://x.com/RayFernando1337
gligen
An intuitive GUI for GLIGEN that uses ComfyUI in the backend https://github.com/mut-ex/gligen-gui
Re-Forge
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion WebUI (based on Gradio) to make development easier, optimize resource management, and speed up inference. https://github.com/Panchovix/stable-diffusion-webui-reForge
UVR5-WebUI
The best vocal remover application on the internet, and it's totally free and open source!
omnigen
A unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. https://huggingface.co/spaces/Shitao/OmniGen
ModelScope Video2Video (Nvidia GPU only)
enhance the resolution and spatiotemporal continuity of text-generated videos and image-generated videos
Moondream1
moondream1 is a tiny (1.6B parameter) vision language model trained by @vikhyatk that performs on par with models twice its size. It is trained on the LLaVa training dataset, and initialized with SigLIP as the vision tower and Phi-1.5 as the text encoder. https://huggingface.co/spaces/vikhyatk/moondream1
HY-MT1.5
🌐 Hunyuan Translation Model Version 1.5 - Supporting mutual translation across 33 languages. Features HY-MT1.5-1.8B (fast) and HY-MT1.5-7B (accurate) models with terminology intervention, contextual translation, and formatted translation support.
