ports/misc/llama-cpp/pkg-message

[
{ type: install
  message: <<EOM
You installed LLaMA-cpp: Facebook's LLaMA model runner.

In order to experience LLaMA-cpp please download some
AI model in the GGUF format, for example from huggingface.com,
run the script below, and open localhost:9011 in your browser
to communicate with this AI model.

$ llama-server -m $MODEL \
  --host 0.0.0.0 \
  --port 9011 \
  -ngl 15

or

you can add the following lines to /etc/rc.conf,
start the llama-server service,
and navigate to http://localhost:8080:
> llama_server_enable=YES
> llama_server_model=/path/to/models/llama-2-7b-chat.Q4_K_M.gguf
> llama_server_args="--device Vulkan0 -ngl 27"

In order to use the multi-model feature do not use llama_server_model.
Instead add the argument "--models-preset /path/to/models.ini"
Add pre-downloaded models into models.ini, for example:
[Qwen3.5-35B-A3B-Uncensored]
model = /path/to/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf

You can switch to the CPU-only operation by choosing the port option
VULKAN=OFF in misc/ggml (not in llama-cpp).

EOM
}
]