Explore tags
One-click launcher for Stable Diffusion web UI Forge (lllyasviel/stable-diffusion-webui-forge)
Stable Diffusion web UI UX: https://github.com/anapnoe/stable-diffusion-webui-ux
Check-ins4 check-ins
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation:https://github.com/Zejun-Yang/AniPortrait
Langflow is a dynamic graph where each node is an executable unit. Its modular and interactive design fosters rapid experimentation and prototyping, pushing hard on the limits of creativity: https://github.com/langflow-ai/langflow
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding/ https://github.com/Tencent/HunyuanDiT
Your image is almost there!:https://github.com/lllyasviel/Omost
Drag & drop UI to build your customized LLM flow: https://github.com/FlowiseAI/Flowise
[Need 24GB VRAM] Cambrian-1 is a family of multimodal LLMs with a vision-centric design: https://github.com/cambrian-mllm/cambrian
Dough is a open source tool for steering AI animations with precision
moondream1 is a tiny (1.6B parameter) vision language model trained by @vikhyatk that performs on par with models twice its size. It is trained on the LLaVa training dataset, and initialized with SigLIP as the vision tower and Phi-1.5 as the text encoder. https://huggingface.co/spaces/vikhyatk/moondream1
[NVIDIA ONLY] Stable Video Diffusion Streamlit App. Currently supports Nvidia GPU machines only.
Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to solve your task).
Turn any image into a video! (Web UI created by fffiloni: https://huggingface.co/spaces/fffiloni/MS-Image2Video)
Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning
[Nvidia GPU only] One click installer for AudioLDM 2 Gradio UI
enhance the resolution and spatiotemporal continuity of text-generated videos and image-generated videos
an open vision-language model by Google. PaliGemma is designed as a versatile model for transfer to a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation https://huggingface.co/spaces/google/paligemma