Model Configuration

Use a model profile when you need more detailed control over model behavior than auto-discovery provides.

This workflow lets you configure model-specific settings such as:

  • context_window
  • tokenizer
  • tool support
  • reasoning support
  • image support
  • sampling parameters

Use it when you want PrivateGPT to know the exact limits and capabilities of each model, or when you need to override what your provider exposes automatically.

This workflow is supported from the source-based Local with uv install:

  1. Generate settings-model.yaml from your running LLM server.
  2. Edit the generated profile.
  3. Start PrivateGPT with PGPT_PROFILES=model.

Generate a model profile

Generate a profile from the models exposed by your OpenAI-compatible server:

$OPENAI_API_BASE=http://localhost:11434/v1 \
> make auto-discover-models
$# or directly:
$OPENAI_API_BASE=http://localhost:11434/v1 \
> uv run python scripts/auto_discover_models.py --out settings-model.yaml

This creates settings-model.yaml with all discovered models as a starting point for detailed configuration.

Start from Local with uv first. Local tokenizer support requires private-gpt[tokenizer-local] or private-gpt[core].


Edit model settings

Open settings-model.yaml and adjust the fields you care about. This is where you explicitly define how PrivateGPT should treat each model. Example:

1llm:
2 default_model: qwen3.5:35b
3
4embedding:
5 default_model: mxbai-embed-large
6
7models:
8 - name: qwen3.5:35b
9 type: llm
10 mode: openai
11 context_window: 32768
12 tokenizer: Qwen/Qwen3.5-35B-A3B
13 support_tools: true
14 support_reasoning: true
15 support_image: 0
16 sampling_params:
17 temperature: 0.6
18 top_p: 0.95
19 top_k: 20
20 min_p: 0.0
21
22 - name: mxbai-embed-large
23 type: embedding
24 mode: openai
25 context_window: 512
FieldDescription
context_windowMaximum tokens the model can process. Set explicitly to avoid overflow.
support_toolsEnable function and tool calling. Use the specific tool extra you need, or private-gpt[tools] as the bundle fallback. private-gpt[core] also includes that bundle.
tokenizerHuggingFace repo ID for exact token counting (for example Qwen/Qwen3.5-35B-A3B). Requires private-gpt[tokenizer-local] or private-gpt[core]. Falls back to a character-based estimate if omitted.
support_reasoningEnable extended thinking or reasoning mode.
support_imageNumber of images per request the model accepts (0 = disabled).
sampling_params.temperatureRandomness (0 = deterministic, 1 = more creative).
sampling_params.top_pNucleus sampling probability mass.

Run with the profile

Once settings-model.yaml exists, start PrivateGPT with PGPT_PROFILES=model.

$OPENAI_API_BASE=http://localhost:11434/v1 \
> PGPT_PROFILES=model \
> uv run python -m private_gpt

PGPT_PROFILES=model tells PrivateGPT to load settings-model.yaml on top of the base config. Profile files follow the naming convention settings-{name}.yaml.