0
[Research] ds4 Models and System Requirements
ds4.c is a narrow Metal inference engine for DeepSeek V4 Flash. It is not a general GGUF runner and only supports the GGUF files published for the antirez/deepseek-v4-gguf model repo.
Served Model IDs
| Model ID | Behavior |
|---|---|
deepseek-v4-flash |
Primary model ID. |
deepseek-chat |
Alias that disables thinking for direct answers. |
deepseek-reasoner |
Alias that enables thinking. |
Downloadable GGUF Files
| Variant | Purpose | File | Approx. size | Intended machine |
|---|---|---|---|---|
q2 |
Main model, lower memory | DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf |
86.7 GB | 128 GB RAM Macs |
q4 |
Main model, larger/higher quality | DeepSeek-V4-Flash-Q4KExperts-F16HC-F16Compressor-F16Indexer-Q8Attn-Q8Shared-Q8Out-chat-v2.gguf |
165 GB | 256 GB+ RAM Macs |
mtp |
Optional speculative decoding support | DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf |
3.81 GB | Optional with either q2 or q4 |
System Requirements and Caveats
| Area | Requirement or caveat |
|---|---|
| Production backend | Metal-only. The server is Metal-only. |
| Target hardware | High-end Macs / Mac Studios with large unified memory. |
| Minimum practical memory | 128 GB for the q2 model. |
| q4 memory | At least 256 GB RAM. |
| Maximum context | Model supports a 1M-token context window. |
| Full-context memory overhead | Around 26 GB extra memory for 1M context, with the compressed indexer around 22 GB. |
| Practical 128 GB context | Around 100k to 300k tokens is wiser than full 1M context. |
| CPU path | Reference/debug only, not production. README warns current macOS CPU execution can crash the kernel. |
| Build | make |
| Download commands | ./download_model.sh q2, ./download_model.sh q4, or ./download_model.sh mtp |
Server API
| Endpoint | Purpose |
|---|---|
GET /v1/models |
List available served models. |
GET /v1/models/deepseek-v4-flash |
Fetch metadata for the primary model. |
POST /v1/chat/completions |
OpenAI-compatible chat completions. |
POST /v1/completions |
OpenAI-compatible completions. |
POST /v1/messages |
Anthropic-compatible messages endpoint. |
Replies (0)
Up to 10 files, 25MB each. Images are optimized; GIFs -> MP4; videos 720p (max 120s).
