Moondream3 Gradio UI
A web interface for the Moondream3 vision-language model featuring image captioning, visual question answering, object detection, and object pointing.
StyleAligned
Style Aligned Image Generation via Shared Attention https://style-aligned-gen.github.io/
LocalAIVtuber
A tool for hosting AI vtubers that runs fully locally and offline: https://github.com/0Xiaohei0/LocalAIVtuber
florence-sam
Integrates Florence2 and SAM2 models for detailed image captioning and object detection. Florence2 generates detailed captions that are then used to perform phrase grounding. The Segment Anything Model 2 (SAM2) converts these phrase-grounded boxes into masks. https://huggingface.co/spaces/SkalskiP/florence-sam
XTTS
clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)
Bark Voice Cloning
Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning
