Files
ports/misc/llama-cpp/pkg-message
T
Yuri Victorovich d31fe59500 misc/llama-cpp: Multiple changes
1. Allow for multiple models to be selected at the run-time
2. Remove the leftover VULKAN option. VULKAN is enabled in misc/ggml.

PR:		294795 (allow multiple models at run-time)
Requested by:	Ivan Rozhuk <rozhuk.im@gmail.com>
2026-04-28 12:14:22 -07:00

37 lines
1.0 KiB
Plaintext

[
{ type: install
message: <<EOM
You installed LLaMA-cpp: Facebook's LLaMA model runner.
In order to experience LLaMA-cpp please download some
AI model in the GGUF format, for example from huggingface.com,
run the script below, and open localhost:9011 in your browser
to communicate with this AI model.
$ llama-server -m $MODEL \
--host 0.0.0.0 \
--port 9011 \
-ngl 15
or
you can add the following lines to /etc/rc.conf,
start the llama-server service,
and navigate to http://localhost:8080:
> llama_server_enable=YES
> llama_server_model=/path/to/models/llama-2-7b-chat.Q4_K_M.gguf
> llama_server_args="--device Vulkan0 -ngl 27"
In order to use the multi-model feature do not use llama_server_model.
Instead add the argument "--models-preset /path/to/models.ini"
Add pre-downloaded models into models.ini, for example:
[Qwen3.5-35B-A3B-Uncensored]
model = /path/to/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf
You can switch to the CPU-only operation by choosing the port option
VULKAN=OFF in misc/ggml (not in llama-cpp).
EOM
}
]