ASUS Ascent GX10 (DGX Spark) 上でds4 (DwarfStar 4) を試すメモ
ds4 (DwarfStar 4)がDGX Sparkに対応したので、試してみます。
ds4のインストール
/opt/ds4にソースコード等を置きます。
sudo mkdir -p /opt/ds4
sudo chown -R $USER:$USER /opt/ds4
git cloneします。
git clone https://github.com/antirez/ds4.git .
次のリビジョンで検証しています。
$ git rev-parse --short HEAD
0cba357
モデルをダウンロードします。
./download_model.sh q2-imatrix
DGX Spark用にコンパイルします。
make cuda-spark
動作確認
まずはワンショットプロンプトを試します。
./ds4 -p "Explain Spring Boot in one paragraph."
出力はこんな感じです。

OpenAI/Anthropic互換サーバーの起動
次にサーバーとして起動します。このサーバーはOpenAI/AnthropicのAPIを実装しており、コーディングエージェントからもアクセスできます。
./ds4-server --ctx 200000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192 --host 0.0.0.0
まずは/v1/modelsにアクセスしてみます。
$ curl localhost:8000/v1/models -s | jq .
{
"object": "list",
"data": [
{
"id": "deepseek-v4-flash",
"object": "model",
"created": 1767225600,
"owned_by": "ds4.c",
"name": "DeepSeek V4 Flash",
"context_length": 200000,
"top_provider": {
"context_length": 200000,
"max_completion_tokens": 200000,
"is_moderated": false
},
"supported_parameters": [
"tools",
"tool_choice",
"max_tokens",
"temperature",
"top_p",
"top_k",
"min_p",
"stop",
"seed",
"stream",
"reasoning_effort"
]
}
]
}
/v1/chat/completionsを試します。
curl http://localhost:8000/v1/chat/completions --json '{
"messages": [
{"role": "user", "content": "Who are you?"}
]
}' -s | jq .
次のようなレスポンスが返ります。
{
"id": "chatcmpl-1",
"object": "chat.completion",
"created": 1778721844,
"model": "deepseek-v4-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm DeepSeek, an AI assistant created by the company DeepSeek (深度求索). I'm here to help you with a wide range of tasks, from answering questions and solving problems to generating creative content and analyzing information.\n\nKey features about me:\n- 🆓 **Free to use** – no subscription required\n- 📁 **File upload support** – can read text from PDFs, Word docs, Excel, PowerPoint, and more\n- 🌐 **Web search** (when manually enabled)\n- 🎯 **Text + image input** (though I can only read text from images)\n- 🤔 **Deep thinking** – I try to reason step by step before answering\n- 📝 **Long context** – up to 1M tokens, so I can process massive amounts of text\n\nI'm designed to be helpful, detailed, and engaging. So, how can I assist you today?",
"reasoning_content": "We need to answer the question \"Who are you?\" as the assistant. The assistant should respond with its identity and capabilities. The user likely expects a standard introduction. I'll provide a concise response introducing myself as DeepSeek AI assistant, highlighting key features like free, file upload, text/image input, deep thinking, and long context."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 259,
"total_tokens": 267
}
}
サーバー側には次のようなログが出力されます。
0514 10:23:45 ds4-server: chat ctx=0..8:8 prompt start
0514 10:23:46 ds4-server: chat ctx=0..8:8 prompt done 0.831s
0514 10:23:50 ds4-server: chat ctx=0..8:8 gen=50 THINKING decoding chunk=15.40 t/s avg=15.40 t/s 3.246s
0514 10:23:53 ds4-server: chat ctx=0..8:8 gen=100 decoding chunk=15.35 t/s avg=15.38 t/s 6.503s
0514 10:23:56 ds4-server: chat ctx=0..8:8 gen=150 decoding chunk=15.21 t/s avg=15.32 t/s 9.791s
0514 10:23:59 ds4-server: chat ctx=0..8:8 gen=200 decoding chunk=15.16 t/s avg=15.28 t/s 13.089s
0514 10:24:03 ds4-server: chat ctx=0..8:8 gen=250 decoding chunk=15.09 t/s avg=15.24 t/s 16.402s
0514 10:24:03 ds4-server: chat ctx=0..8:8 gen=259 decoding chunk=15.16 t/s avg=15.24 t/s 16.995s
0514 10:24:03 ds4-server: thinking checkpoint canonicalization needs rebuild ctx=0..8:8 common=7 live=267 canonical=199 reason="rewrite needs rebuild: common=7 live=267 canonical=199"
0514 10:24:04 ds4-server: thinking checkpoint canonicalized ctx=0..8:8 common=7 live=267 canonical=199 via=rebuild
0514 10:24:04 ds4-server: chat ctx=0..8:8 gen=259 finish=stop 18.520s
実行中のリソース使用状況は次のような感じです。

modelをdeepseek-chatにするとthinkingはoffになります。
curl http://localhost:8000/v1/chat/completions --json '{
"messages": [
{"role": "user", "content": "Who are you?"}
],
"model": "deepseek-chat"
}' -s | jq .
次のようなレスポンスが返ります。
{
"id": "chatcmpl-2",
"object": "chat.completion",
"created": 1778722026,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm DeepSeek, an AI assistant created by the company DeepSeek (深度求索). I'm here to help you with a wide range of tasks—answering questions, writing, brainstorming ideas, analyzing data, coding, and much more.\n\nI'm a pure text-based model that can handle large amounts of text (up to 1M tokens in context), and I support reading from links and uploading files (images, PDFs, Word, Excel, etc.) to extract and process text content. I'm fully free to use, with no subscription needed.\n\nIs there anything I can help you with today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 128,
"total_tokens": 136
}
}
systemdのサービスとして登録
サーバーをsystemdで常駐させます。
cat <<EOF | sudo tee /etc/systemd/system/ds4-server.service > /dev/null
[Unit]
Description=DwarfStar 4 Server
Documentation=https://github.com/antirez/ds4
After=network-online.target
Wants=network-online.target
[Service]
User=$USER
Group=$USER
Type=simple
ExecStart=/opt/ds4/ds4-server --model /opt/ds4/ds4flash.gguf --ctx 200000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192 --host 0.0.0.0
Restart=on-failure
RestartSec=30s
StartLimitInterval=300
StartLimitBurst=5
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable ds4-server
sudo systemctl start ds4-server
sudo systemctl status ds4-server
Claude Codeからアクセス
綺麗な環境でClaude Codeのインストールします
curl -fsSL https://claude.ai/install.sh | bash
export PATH="$HOME/.local/bin:$PATH"
次のバージョンで試しました。
$ claude --version
2.1.141 (Claude Code)
claudeコマンドをラップする次のスクリプトを用意します。
cat <<'EOF' > ~/cc-ds4.sh
#!/bin/bash
unset ANTHROPIC_API_KEY
export ANTHROPIC_BASE_URL="${DS4_ANTHROPIC_BASE_URL:-http://<DGX SparkのIPアドレス>:8000}"
export ANTHROPIC_AUTH_TOKEN="${DS4_API_KEY:-dsv4-local}"
#export ANTHROPIC_MODEL="deepseek-v4-flash"
export ANTHROPIC_MODEL="deepseek-chat"
export ANTHROPIC_CUSTOM_MODEL_OPTION=$ANTHROPIC_MODEL
export ANTHROPIC_DEFAULT_SONNET_MODEL=$ANTHROPIC_MODEL
export ANTHROPIC_DEFAULT_HAIKU_MODEL=$ANTHROPIC_MODEL
export ANTHROPIC_DEFAULT_OPUS_MODEL=$ANTHROPIC_MODEL
export CLAUDE_CODE_SUBAGENT_MODEL=$ANTHROPIC_MODEL
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
export CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK=1
export CLAUDE_STREAM_IDLE_TIMEOUT_MS=600000
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=100000
export CLAUDE_CODE_ATTRIBUTION_HEADER=0
export CLAUDE_CODE_ENABLE_TELEMETRY=0
set -ex
claude --dangerously-skip-permissions $@
EOF
chmod +x ~/cc-ds4.sh
作業ディレクトリを作成して、
mkdir -p hello
cd hello
Claude Codeを実行します。
~/cc-ds4.sh
普通に使えます。

journalctl -u ds4-server -fでログを見ると、最初にClaude Codeのシステムプロンプトを含むプロンプトの読み込み(prefill)に1分くらいかかっています。Toolの呼び出しも問題なく、ログにTool名が出力されていてわかりやすいです。
May 14 11:24:32 gx10-c90e ds4-server[3362707]: 0514 11:24:32 ds4-server: chat ctx=0..192:192 prompt start
May 14 11:24:33 gx10-c90e ds4-server[3362707]: 0514 11:24:33 ds4-server: chat ctx=0..192:192 prompt done 0.997s
May 14 11:24:35 gx10-c90e ds4-server[3362707]: 0514 11:24:35 ds4-server: chat ctx=0..192:192 gen=35 decoding chunk=15.20 t/s avg=15.20 t/s 2.303s
May 14 11:24:35 gx10-c90e ds4-server[3362707]: 0514 11:24:35 ds4-server: chat ctx=0..192:192 gen=35 finish=stop 3.300s
May 14 11:24:35 gx10-c90e ds4-server[3362707]: 0514 11:24:35 ds4-server: chat ctx=0..22896:22896 TOOLS prompt start
May 14 11:24:35 gx10-c90e ds4-server[3362707]: 0514 11:24:35 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 0/22896 (0.0%) chunk=0.00 t/s avg=0.00 t/s 0.000s
May 14 11:24:40 gx10-c90e ds4-server[3362707]: 0514 11:24:40 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 2048/22896 (8.9%) chunk=425.39 t/s avg=425.39 t/s 4.814s
May 14 11:24:45 gx10-c90e ds4-server[3362707]: 0514 11:24:45 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 4096/22896 (17.9%) chunk=402.69 t/s avg=413.72 t/s 9.900s
May 14 11:24:51 gx10-c90e ds4-server[3362707]: 0514 11:24:51 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 6144/22896 (26.8%) chunk=396.82 t/s avg=407.93 t/s 15.061s
May 14 11:24:56 gx10-c90e ds4-server[3362707]: 0514 11:24:56 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 8192/22896 (35.8%) chunk=392.12 t/s avg=403.86 t/s 20.284s
May 14 11:25:01 gx10-c90e ds4-server[3362707]: 0514 11:25:01 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 10240/22896 (44.7%) chunk=386.23 t/s avg=400.21 t/s 25.587s
May 14 11:25:01 gx10-c90e ds4-server[3362707]: 0514 11:25:01 ds4-server: kv cache stored tokens=10240 trimmed=0 reason=continued size=157.34 MiB save=151.1 ms
May 14 11:25:01 gx10-c90e ds4-server[3362707]: 0514 11:25:01 ds4-server: kv cache evicted tokens=10240 hits=0 size=157.34 MiB
May 14 11:25:07 gx10-c90e ds4-server[3362707]: 0514 11:25:07 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 12288/22896 (53.7%) chunk=370.53 t/s avg=394.93 t/s 31.114s
May 14 11:25:12 gx10-c90e ds4-server[3362707]: 0514 11:25:12 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 14336/22896 (62.6%) chunk=378.24 t/s avg=392.46 t/s 36.529s
May 14 11:25:17 gx10-c90e ds4-server[3362707]: 0514 11:25:17 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 16384/22896 (71.6%) chunk=376.42 t/s avg=390.38 t/s 41.969s
May 14 11:25:23 gx10-c90e ds4-server[3362707]: 0514 11:25:23 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 18432/22896 (80.5%) chunk=372.32 t/s avg=388.29 t/s 47.470s
May 14 11:25:28 gx10-c90e ds4-server[3362707]: 0514 11:25:28 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 20480/22896 (89.4%) chunk=369.35 t/s avg=386.31 t/s 53.015s
May 14 11:25:29 gx10-c90e ds4-server[3362707]: 0514 11:25:29 ds4-server: kv cache stored tokens=20480 trimmed=0 reason=continued size=291.80 MiB save=211.5 ms
May 14 11:25:29 gx10-c90e ds4-server[3362707]: 0514 11:25:29 ds4-server: kv cache evicted tokens=20480 hits=0 size=291.80 MiB
May 14 11:25:34 gx10-c90e ds4-server[3362707]: 0514 11:25:34 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 22528/22896 (98.4%) chunk=351.17 t/s avg=382.82 t/s 58.847s
May 14 11:25:35 gx10-c90e ds4-server[3362707]: 0514 11:25:35 ds4-server: kv cache stored tokens=22528 trimmed=368 reason=cold size=318.69 MiB save=196.3 ms
May 14 11:25:35 gx10-c90e ds4-server[3362707]: 0514 11:25:35 ds4-server: kv cache evicted tokens=22528 hits=0 size=318.69 MiB
May 14 11:25:36 gx10-c90e ds4-server[3362707]: 0514 11:25:36 ds4-server: chat ctx=0..22896:22896 TOOLS prefill chunk 22896/22896 (100.0%) chunk=252.03 t/s avg=379.66 t/s 60.307s
May 14 11:25:36 gx10-c90e ds4-server[3362707]: 0514 11:25:36 ds4-server: chat ctx=0..22896:22896 TOOLS prompt done 60.310s
May 14 11:25:40 gx10-c90e ds4-server[3362707]: 0514 11:25:40 ds4-server: chat ctx=0..22896:22896 gen=50 TOOLS decoding chunk=13.29 t/s avg=13.29 t/s 3.761s
May 14 11:25:42 gx10-c90e ds4-server[3362707]: 0514 11:25:42 ds4-server: chat ctx=0..22896:22896 gen=85 TOOLS decoding chunk=13.32 t/s avg=13.31 t/s 6.388s
May 14 11:25:42 gx10-c90e ds4-server[3362707]: 0514 11:25:42 ds4-server: chat ctx=0..22896:22896 gen=85 TOOLS finish=stop 66.698s
May 14 11:27:58 gx10-c90e ds4-server[3362707]: 0514 11:27:58 ds4-server: chat ctx=22981..22990:9 TOOLS prompt start
May 14 11:27:59 gx10-c90e ds4-server[3362707]: 0514 11:27:59 ds4-server: chat ctx=22981..22990:9 TOOLS prompt done 0.679s
May 14 11:28:03 gx10-c90e ds4-server[3362707]: 0514 11:28:03 ds4-server: chat ctx=22981..22990:9 gen=50 TOOLS decoding chunk=13.33 t/s avg=13.33 t/s 3.752s
May 14 11:28:06 gx10-c90e ds4-server[3362707]: 0514 11:28:06 ds4-server: chat ctx=22981..22990:9 gen=100 TOOLS DSML_START decoding chunk=13.36 t/s avg=13.34 t/s 7.495s
May 14 11:28:09 gx10-c90e ds4-server[3362707]: 0514 11:28:09 ds4-server: chat ctx=22981..22990:9 gen=127 TOOLS DSML_START DSML_END decoding chunk=13.36 t/s avg=13.35 t/s 9.515s
May 14 11:28:09 gx10-c90e ds4-server[3362707]: 0514 11:28:09 ds4-server: tool calls ctx=22981..22990:9 n=1 names=[Bash]
May 14 11:28:09 gx10-c90e ds4-server[3362707]: 0514 11:28:09 ds4-server: tool checkpoint canonicalized ctx=22981..22990:9 common=23117 live=23117 canonical=23118
May 14 11:28:09 gx10-c90e ds4-server[3362707]: 0514 11:28:09 ds4-server: chat ctx=22981..22990:9 gen=127 TOOLS DSML_START DSML_END finish=tool_calls 10.285s
May 14 11:28:09 gx10-c90e ds4-server[3362707]: 0514 11:28:09 ds4-server: chat ctx=23118..23143:25 TOOLS prompt start
May 14 11:28:11 gx10-c90e ds4-server[3362707]: 0514 11:28:11 ds4-server: chat ctx=23118..23143:25 TOOLS prompt done 1.876s
May 14 11:28:14 gx10-c90e ds4-server[3362707]: 0514 11:28:14 ds4-server: chat ctx=23118..23143:25 gen=38 TOOLS decoding chunk=13.33 t/s avg=13.33 t/s 2.851s
May 14 11:28:14 gx10-c90e ds4-server[3362707]: 0514 11:28:14 ds4-server: chat ctx=23118..23143:25 gen=38 TOOLS finish=stop 4.727s