[Research] ds4 Models and System Requirements

@cocktailpeanut5/8/2026, 4:23:48 AM

ds4.c is a narrow Metal inference engine for DeepSeek V4 Flash. It is not a general GGUF runner and only supports the GGUF files published for the antirez/deepseek-v4-gguf model repo.

Served Model IDs

Model ID	Behavior
`deepseek-v4-flash`	Primary model ID.
`deepseek-chat`	Alias that disables thinking for direct answers.
`deepseek-reasoner`	Alias that enables thinking.

Downloadable GGUF Files

Variant	Purpose	File	Approx. size	Intended machine
`q2`	Main model, lower memory	`DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf`	86.7 GB	128 GB RAM Macs
`q4`	Main model, larger/higher quality	`DeepSeek-V4-Flash-Q4KExperts-F16HC-F16Compressor-F16Indexer-Q8Attn-Q8Shared-Q8Out-chat-v2.gguf`	165 GB	256 GB+ RAM Macs
`mtp`	Optional speculative decoding support	`DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf`	3.81 GB	Optional with either q2 or q4

System Requirements and Caveats

Area	Requirement or caveat
Production backend	Metal-only. The server is Metal-only.
Target hardware	High-end Macs / Mac Studios with large unified memory.
Minimum practical memory	128 GB for the q2 model.
q4 memory	At least 256 GB RAM.
Maximum context	Model supports a 1M-token context window.
Full-context memory overhead	Around 26 GB extra memory for 1M context, with the compressed indexer around 22 GB.
Practical 128 GB context	Around 100k to 300k tokens is wiser than full 1M context.
CPU path	Reference/debug only, not production. README warns current macOS CPU execution can crash the kernel.
Build	`make`
Download commands	`./download_model.sh q2`, `./download_model.sh q4`, or `./download_model.sh mtp`

Server API

Endpoint	Purpose
`GET /v1/models`	List available served models.
`GET /v1/models/deepseek-v4-flash`	Fetch metadata for the primary model.
`POST /v1/chat/completions`	OpenAI-compatible chat completions.
`POST /v1/completions`	OpenAI-compatible completions.
`POST /v1/messages`	Anthropic-compatible messages endpoint.

Discussion (0)

Up to 10 files, 25MB each. Images are optimized; GIFs -> MP4; videos 720p (max 120s).