Explore tags
An open source implementation of Microsoft's VALL-E X zero-shot TTS model
Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)
Demo showcasing ~real-time Latent Consistency Model pipeline with Diffusers and a MJPEG stream server (https://github.com/radames/Real-Time-Latent-Consistency-Model)
A Realtime Creation Engine
Convert your videos to densepose and use it on MagicAnimate https://github.com/Flode-Labs/vid2densepose
Estimating the Focal Length of a Monocular Image
Check-insNo check-ins yet
Platforms
GPUNVIDIAAMDApple
Integrates Florence2 and SAM2 models for detailed image captioning and object detection. Florence2 generates detailed captions that are then used to perform phrase grounding. The Segment Anything Model 2 (SAM2) converts these phrase-grounded boxes into masks. https://huggingface.co/spaces/SkalskiP/florence-sam
[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Check-insNo check-ins yet
Platforms
GPUNVIDIAAMDApple
Aplikasi ini digunakan untuk menghasilkan suara berbasis teks dengan berbagai pilihan pembicara. Teknologi yang digunakan meliputi model text-to-speech (TTS) yang canggih dengan konversi teks ke fonem. Model yang dipakai dilatih khusus untuk bahasa Indonesia, Jawa dan Sunda.
Check-insNo check-ins yet
Platforms
GPUNVIDIAAMDApple
Bring portraits to life!
Check-insNo check-ins yet
Platforms
GPUNVIDIAAMDApple
A simple FastAPI Server to run XTTSv2
Check-insNo check-ins yet
Platforms
GPUNVIDIAAMDApple
Stable Diffusion web UI UX: https://github.com/anapnoe/stable-diffusion-webui-ux
Check-ins4 check-ins
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation:https://github.com/Zejun-Yang/AniPortrait
Langflow is a dynamic graph where each node is an executable unit. Its modular and interactive design fosters rapid experimentation and prototyping, pushing hard on the limits of creativity: https://github.com/langflow-ai/langflow
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding/ https://github.com/Tencent/HunyuanDiT
Your image is almost there!:https://github.com/lllyasviel/Omost
Drag & drop UI to build your customized LLM flow: https://github.com/FlowiseAI/Flowise
[Need 24GB VRAM] Cambrian-1 is a family of multimodal LLMs with a vision-centric design: https://github.com/cambrian-mllm/cambrian