[Research] ds4 Models and System Requirements

ds4.c is a narrow Metal inference engine for DeepSeek V4 Flash. It is not a general GGUF runner and only supports the GGUF files published for the antirez/deepseek-v4-gguf model repo.

Served Model IDs

Model ID Behavior
deepseek-v4-flash Primary model ID.
deepseek-chat Alias that disables thinking for direct answers.
deepseek-reasoner Alias that enables thinking.

Downloadable GGUF Files

Variant Purpose File Approx. size Intended machine
q2 Main model, lower memory DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf 86.7 GB 128 GB RAM Macs
q4 Main model, larger/higher quality DeepSeek-V4-Flash-Q4KExperts-F16HC-F16Compressor-F16Indexer-Q8Attn-Q8Shared-Q8Out-chat-v2.gguf 165 GB 256 GB+ RAM Macs
mtp Optional speculative decoding support DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf 3.81 GB Optional with either q2 or q4

System Requirements and Caveats

Area Requirement or caveat
Production backend Metal-only. The server is Metal-only.
Target hardware High-end Macs / Mac Studios with large unified memory.
Minimum practical memory 128 GB for the q2 model.
q4 memory At least 256 GB RAM.
Maximum context Model supports a 1M-token context window.
Full-context memory overhead Around 26 GB extra memory for 1M context, with the compressed indexer around 22 GB.
Practical 128 GB context Around 100k to 300k tokens is wiser than full 1M context.
CPU path Reference/debug only, not production. README warns current macOS CPU execution can crash the kernel.
Build make
Download commands ./download_model.sh q2, ./download_model.sh q4, or ./download_model.sh mtp

Server API

Endpoint Purpose
GET /v1/models List available served models.
GET /v1/models/deepseek-v4-flash Fetch metadata for the primary model.
POST /v1/chat/completions OpenAI-compatible chat completions.
POST /v1/completions OpenAI-compatible completions.
POST /v1/messages Anthropic-compatible messages endpoint.
Discussion (0)
Up to 10 files, 25MB each. Images are optimized; GIFs -> MP4; videos 720p (max 120s).
[Research] ds4 Models and System Requirements · Pinokio