Alexandria — Turn Any Book Into a Multi-Voice Audiobook With AI (Local, Free, Open Source)

@finrandojin2/13/2026, 6:13:53 PMOwner

Alexandria Audiobook Generator

Alexandria is an open-source tool that turns any book or novel into a fully-voiced, multi-character audiobook using AI, all running locally on your machine. It works through a two-stage pipeline: first, an LLM (like Qwen, via LM Studio or Ollama) reads your text and automatically annotates it into a structured script, identifying characters, dialogue, narration, and even style directions (e.g., "fearful surprise" or "slow and threatening"). Then, Qwen3 TTS generates the actual speech, with different voices for each character.

##Sample: https://vocaroo.com/1cG82gVS61hn

How It Works

Upload a .txt or .md file of your book → the LLM breaks it into a JSON script with speaker labels, dialogue, and emotion/style cues → you assign voices to each character → the TTS engine renders the audio → you review and edit in the browser → you get a finished MP3 audiobook.

Major Features

LLM-powered script annotation automatically identifies speakers, splits narration from dialogue, and writes natural style directions for each line. Works with any OpenAI-compatible API (local or cloud).
Voice cloning clone any voice from just a 5–15 second audio sample, or choose from 9 built-in voices with full style/emotion control.
LoRA voice pipeline supports Qwen3 TTS LoRA fine-tuned voices for higher-quality, more consistent character voices beyond basic cloning.
Browser-based editor with per-line editing web UI lets you review and edit every single line's speaker, text, and style. You can selectively regenerate individual chunks one at a time without redoing the whole book, making it easy to fix a single mispronunciation or tweak delivery.
Natural non-verbal sounds the LLM generates real pronounceable vocalizations (gasps, laughter, sighs) with context-aware delivery directions, not robotic tags.
Flexible export including Audacity integration get a single combined MP3, individual voiceline files per line, or a one-click Audacity export that generates per-speaker WAV tracks, a .lof project file, and chunk labels. Just unzip, open in Audacity, and you get a full multi-track project with each character on their own track for fine-tuning timing, effects, and mixing.
Batch rendering includes an experimental fast mode (~5x speedup) for bulk audio generation.
REST API full programmatic access for scripting and automation.
REQUIREMENTS
- GPU: 8 GB VRAM minimum, 16 GB+ recommended. NVIDIA (CUDA 11.8+) or AMD (ROCm 6.0+).
- RAM: 16 GB recommended (8 GB minimum).
- Disk: ~20 GB minimum. Breakdown: ~8 GB venv/PyTorch, ~7 GB for 2 model variants (CustomVoice + Base), ~3-5 GB working space for generated audio.
- CPU: Not a bottleneck, any modern multi-core.
- Note: Only one model occupies VRAM at a time during generation. Batch throughput scales with free VRAM after model loading. Clone/LoRA voices need more VRAM per sequence than custom voices due to reference audio encoding.

Please provide feedback on what improvements or features you would like for your particular usecase.

For instructions and references visit the Wiki: https://github.com/Finrandojin/alexandria-audiobook/wiki

Discussion (3)

Up to 10 files, 25MB each. Images are optimized; GIFs -> MP4; videos 720p (max 120s).