Skip to main content

QMD Installing GGUF Models in CPU-Only Environment

· 4 min read

This guide applies when QMD (including OpenClaw's QMD memory backend) runs in CPU mode or a network-restricted environment and you need to install local GGUF models manually.

  • Date: 2026-03-05

1. Model overview

QMD loads three local GGUF models via node-llama-cpp for embedding, reranking, and query expansion. On first use it downloads them from HuggingFace; if auto-download fails (e.g. ETIMEDOUT, ENETUNREACH) or you deploy offline, follow this guide to download and place them in the cache directory.

1.1 Models and usage

ModelPurposeApprox. sizeHuggingFace repo
embeddinggemma-300M-Q8_0Embedding~300MBggml-org/embeddinggemma-300m-qat-q8_0-GGUF
qwen3-reranker-0.6b-q8_0Reranking~640MBggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
qmd-query-expansion-1.7B-q4_k_mQuery expansion~1.1GBtobil/qmd-query-expansion-1.7B

1.2 Cache directory

  • If XDG_CACHE_HOME is set (e.g. when aligned with OpenClaw): $XDG_CACHE_HOME/qmd/models/
  • Otherwise: ~/.cache/qmd/models/

When installing manually, use the same cache directory that QMD actually uses (as shown by qmd status). If QMD is started by OpenClaw Gateway, it is usually: ~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/ (or your agentId path).


# Install HuggingFace CLI if needed
pip install -U "huggingface_hub[cli]"

# Create cache directory
CACHE="$HOME/.cache/qmd/models"
# If using OpenClaw's XDG layout, use:
# STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"
# CACHE="$STATE_DIR/agents/main/qmd/xdg-cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"

# Download the three models (HuggingFace access required)
huggingface-cli download ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF --local-dir . --local-dir-use-symlinks False
huggingface-cli download tobil/qmd-query-expansion-1.7B qmd-query-expansion-1.7B-Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

After download, ensure the .gguf filenames in the directory match the repo. If the CLI created subdirs, move the .gguf files to $CACHE root or keep the structure as required by QMD (see official docs).


Download the files below into the same $CACHE directory with the filenames shown. Links are official HuggingFace:

Save asDirect link (HuggingFace)
embeddinggemma-300m-qat-q8_0.ggufhttps://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf
qwen3-reranker-0.6b-q8_0.ggufhttps://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf
qmd-query-expansion-1.7B-Q4_K_M.ggufhttps://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf

Commands (browser or wget/curl):

CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://huggingface.co/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"

Mirror (e.g. China): If huggingface.co is slow or blocked, set HF_ENDPOINT and use the mirror:

export HF_ENDPOINT=https://hf-mirror.com

CACHE="$HOME/.cache/qmd/models"
mkdir -p "$CACHE"
cd "$CACHE"
wget -c "https://hf-mirror.com/ggml-org/embeddinggemma-300M-GGUF/resolve/main/embeddinggemma-300M-Q8_0.gguf"
wget -c "https://hf-mirror.com/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/resolve/main/qwen3-reranker-0.6b-q8_0.gguf"
wget -c "https://hf-mirror.com/tobil/qmd-query-expansion-1.7B/resolve/main/qmd-query-expansion-1.7B-Q4_K_M.gguf"

4. Verify

# If using OpenClaw's XDG layout
export XDG_CACHE_HOME="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}/agents/main/qmd/xdg-cache"
ls -la "$XDG_CACHE_HOME/qmd/models/"
# Or default cache
ls -la "$HOME/.cache/qmd/models/"

# QMD status or a query should no longer trigger model download
qmd status

If you still see "model not found" or auto-download, the cache path in use does not match; confirm the actual path from qmd status or official docs and place files there.


5. After CPU install

After "installing QMD (with NODE_LLAMA_CPP_CUDA=false)" and "manually installing the three GGUF models", no extra system config is required for CPU (no extra env vars or services). Just ensure:

ItemNotes
OpenClaw configIf QMD memory backend is not set, configure memory.backend: "qmd", qmd.paths, qmd.limits in openclaw.json and restart the gateway. See OpenClaw configuration.
Cache pathIf QMD is started by OpenClaw Gateway, place models in ~/.openclaw/agents/main/qmd/xdg-cache/qmd/models/ (or your agentId path). If you used ~/.cache/qmd/models/ before, copy the same .gguf files to the xdg-cache path.

References

DescriptionLink
QMD repohttps://github.com/tobi/qmd
node-llama-cpp CUDAhttps://node-llama-cpp.withcat.ai/guide/CUDA

Related posts