Store
Omnigen 2
Unified Image Understanding and Generation. Text-to-Image Generation, In-context Generation, Instruction-guided Image Editing, Visual Understanding (Minimum Requirements 12GBV RAM / 48GB RAM, Recommended Requirements 24GB VRAM / 32GB RAM)
Ovis2-8B
interacting with the Ovis2-8B model. The script allows users to load the model, process image and video inputs, and generate text-based responses using a conversational chatbot.
Nemoml
[NVIDIA ONLY] A minimal Gradio interface for Automatic Speech Recognition. Transcribe Audio in Malayalam language.
Direct3D-S2
[NVIDIA ONLY] Direct3D-S2 is a scalable 3D shape generation framework leveraging sparse volumetric representations for high-resolution outputs. It features Spatial Sparse Attention (SSA), a novel mechanism that accelerates Diffusion Transformer computations on sparse data, achieving up to 9.6× speedup in training. The unified Sparse VAE architecture maintains a consistent sparse volumetric format across input, latent, and output stages, significantly improving efficiency and stability.
🎬 AutoGif
Transform YouTube videos into stunning animated GIFs with perfectly-timed, stylized subtitles and eye-catching effects.
AIraoke
Transform lyric transcriptions into karaoke-style MP4 videos. Built on Python-Lyric-Transcriber, this Gradio UI uses Whisper for transcription, an LLM for lyric edits, and Demucs for vocal separation. A fun tool for karaoke fans, though outputs may vary.
