0

VibeVoice speeds up over long content

@theran_mageposted 6/1/2026, 12:09:48 AM·0 replies

What happened?
When using Vibevoice for longform text to speech content, it seems to speed up over time. Comparing the start, which begins with an even pace and comparing it to the end, which seems to sound like someone who is rapidly trying to finish a speech while they go to the bathroom, I'd like to see if anyone has ideas on how to prevent this.

I've tried installing VibeVoice-Realitme in Pinokio and that version doesn't seem to suffer from the same pace increase as this model.

Any guidance would be appreciated.

Steps to reproduce

  1. Download either the 1.5B or 7B VibeVoice model and load it
  2. Insert text of 4,000 characters or more. 8,000+ is more noticeable.
  3. Generate the Podcast
  4. Compare start speech speed to end speech speed, note how much faster the end sounds than the beginning

Your system (OS / GPU / RAM / VRAM / etc.)
Windows 11, 32GB ram, NVidia 4070 Ti with 16GB VRAM

Replies (0)
Up to 10 files, 25MB each. Images are optimized; GIFs -> MP4; videos 720p (max 120s).
VibeVoice speeds up over long content · Pinokio