QMD Installing GGUF Models in CPU-Only Environment
This guide applies when QMD (including OpenClaw's QMD memory backend) runs in CPU mode or a network-restricted environment and you need to install local GGUF models manually.
- Date: 2026-03-05
1. Model overview
QMD loads three local GGUF models via node-llama-cpp for embedding, reranking, and query expansion. On first use it downloads them from HuggingFace; if auto-download fails (e.g. ETIMEDOUT, ENETUNREACH) or you deploy offline, follow this guide to download and place them in the cache directory.
1.1 Models and usage
| Model | Purpose | Approx. size | HuggingFace repo |
|---|---|---|---|
| embeddinggemma-300M-Q8_0 | Embedding | ~300MB | ggml-org/embeddinggemma-300m-qat-q8_0-GGUF |
| qwen3-reranker-0.6b-q8_0 | Reranking | ~640MB | ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF |
| qmd-query-expansion-1.7B-q4_k_m | Query expansion | ~1.1GB | tobil/qmd-query-expansion-1.7B |
1.2 Cache directory
- If XDG_CACHE_HOME is set (e.g. when aligned with OpenClaw):
$XDG_CACHE_HOME/qmd/models/ - Otherwise:
~/.cache/qmd/models/
When installing manually, use the same cache directory that QMD actually uses (as shown by qmd status). If QMD is started by OpenClaw Gateway, it is usually: ~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/ (or your agentId path).
2. Option A: huggingface-cli (recommended)
# Install HuggingFace CLI if needed
pip install -U "huggingface_hub[cli]"
# Create cache directory
CACHE="$HOME/.cache/qmd/models"
# If using OpenClaw's XDG layout, use:
# STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
# CACHE="$STATE_DIR/agents/main/qmd/xdg-cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
# Download the three models (HuggingFace access required)
huggingface-cli download ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
After download, ensure the .gguf filenames in the directory match the repo. If the CLI created subdirs, move the .gguf files to $CACHE root or keep the structure as required by QMD (see official docs).
3. Option B: Browser or wget/curl direct links
Download the files below into the same $CACHE directory with the filenames shown. Links are official HuggingFace:
| Save as | Direct link (HuggingFace) |
|---|---|
| embeddinggemma-300m-qat-q8_0.gguf | https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf |
| qwen3-reranker-0.6b-q8_0.gguf | https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf |
| qmd-query-expansion-1.7B-Q4_K_M.gguf | https://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf |
Commands (browser or wget/curl):
CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"
Mirror (e.g. China): If huggingface.co is slow or blocked, set HF_ENDPOINT and use the mirror:
export HF_ENDPOINT=https://hf-mirror.com
CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://hf-mirror.com/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://hf-mirror.com/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://hf-mirror.com/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"
4. Verify
# If using OpenClaw's XDG layout
export XDG_CACHE_HOME="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/agents/main/qmd/xdg-cache"
ls -la "$XDG_CACHE_HOME/qmd/models/"
# Or default cache
ls -la "$HOME/.cache/qmd/models/"
# QMD status or a query should no longer trigger model download
qmd status
If you still see "model not found" or auto-download, the cache path in use does not match; confirm the actual path from qmd status or official docs and place files there.
5. After CPU install
After "installing QMD (with NODE_LLAMA_CPP_CUDA=false)" and "manually installing the three GGUF models", no extra system config is required for CPU (no extra env vars or services). Just ensure:
| Item | Notes |
|---|---|
| OpenClaw config | If QMD memory backend is not set, configure memory.backend: "qmd", qmd.paths, qmd.limits in openclaw.json and restart the gateway. See OpenClaw configuration. |
| Cache path | If QMD is started by OpenClaw Gateway, place models in ~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/ (or your agentId path). If you used ~/.cache/qmd/models/ before, copy the same .gguf files to the xdg-cache path. |
References
| Description | Link |
|---|---|
| QMD repo | https://github.com/tobi/qmd |
| node-llama-cpp CUDA | https://node-llama-cpp.withcat.ai/guide/CUDA |