QMD Installing GGUF Models in CPU-Only Environment

March 5, 2026 · 4 min read

This guide applies when QMD (including OpenClaw's QMD memory backend) runs in CPU mode or a network-restricted environment and you need to install local GGUF models manually.

Date: 2026-03-05

1. Model overview

QMD loads three local GGUF models via node-llama-cpp for embedding, reranking, and query expansion. On first use it downloads them from HuggingFace; if auto-download fails (e.g. ETIMEDOUT, ENETUNREACH) or you deploy offline, follow this guide to download and place them in the cache directory.

1.1 Models and usage

Model	Purpose	Approx. size	HuggingFace repo
embeddinggemma-300M-Q8_0	Embedding	~300MB	ggml-org/embeddinggemma-300m-qat-q8_0-GGUF
qwen3-reranker-0.6b-q8_0	Reranking	~640MB	ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
qmd-query-expansion-1.7B-q4_k_m	Query expansion	~1.1GB	tobil/qmd-query-expansion-1.7B

1.2 Cache directory

If XDG_CACHE_HOME is set (e.g. when aligned with OpenClaw): $XDG_CACHE_HOME/qmd/models/
Otherwise: ~/.cache/qmd/models/

When installing manually, use the same cache directory that QMD actually uses (as shown by qmd status). If QMD is started by OpenClaw Gateway, it is usually: ~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/ (or your agentId path).

2. Option A: huggingface-cli (recommended)

# Install HuggingFace CLI if needed
pip install -U "huggingface_hub[cli]"

# Create cache directory
CACHE="$HOME/.cache/qmd/models"
# If using OpenClaw's XDG layout, use:
# STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
# CACHE="$STATE_DIR/agents/main/qmd/xdg-cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"

# Download the three models (HuggingFace access required)
huggingface-cli download ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

After download, ensure the .gguf filenames in the directory match the repo. If the CLI created subdirs, move the .gguf files to $CACHE root or keep the structure as required by QMD (see official docs).

3. Option B: Browser or wget/curl direct links

Download the files below into the same $CACHE directory with the filenames shown. Links are official HuggingFace:

Save as	Direct link (HuggingFace)
embeddinggemma-300m-qat-q8_0.gguf	https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf
qwen3-reranker-0.6b-q8_0.gguf	https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf
qmd-query-expansion-1.7B-Q4_K_M.gguf	https://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf

Commands (browser or wget/curl):

CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"

Mirror (e.g. China): If huggingface.co is slow or blocked, set HF_ENDPOINT and use the mirror:

export HF_ENDPOINT=https://hf-mirror.com

CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://hf-mirror.com/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://hf-mirror.com/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://hf-mirror.com/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"

4. Verify

# If using OpenClaw's XDG layout
export XDG_CACHE_HOME="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/agents/main/qmd/xdg-cache"
ls -la "$XDG_CACHE_HOME/qmd/models/"
# Or default cache
ls -la "$HOME/.cache/qmd/models/"

# QMD status or a query should no longer trigger model download
qmd status

If you still see "model not found" or auto-download, the cache path in use does not match; confirm the actual path from qmd status or official docs and place files there.

5. After CPU install

After "installing QMD (with NODE_LLAMA_CPP_CUDA=false)" and "manually installing the three GGUF models", no extra system config is required for CPU (no extra env vars or services). Just ensure:

Item	Notes
OpenClaw config	If QMD memory backend is not set, configure `memory.backend: "qmd"`, `qmd.paths`, `qmd.limits` in `openclaw.json` and restart the gateway. See OpenClaw configuration.
Cache path	If QMD is started by OpenClaw Gateway, place models in `~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/` (or your agentId path). If you used `~/.cache/qmd/models/` before, copy the same `.gguf` files to the xdg-cache path.

References

Description	Link
QMD repo	https://github.com/tobi/qmd
node-llama-cpp CUDA	https://node-llama-cpp.withcat.ai/guide/CUDA

1. Model overview​

1.1 Models and usage​

1.2 Cache directory​

2. Option A: huggingface-cli (recommended)​

3. Option B: Browser or wget/curl direct links​

4. Verify​

5. After CPU install​

References​

Related posts