For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Getting started
    • Introduction
    • Quickstart
    • How it works
  • Installation Options
    • Package Install
    • Docker
    • Development
  • Configuration
    • CLI
    • Settings & Profiles
    • Model Configuration
  • Inference Providers
    • Overview
    • Ollama
    • LM Studio
    • LlamaCPP Server
    • vLLM
  • Integrations
    • Overview
    • Claude Code
    • Claude Desktop
    • Claude for Microsoft 365
    • OpenCode
  • Built-in Tools
    • Web Tools
    • Database Tools
  • Storage Providers
    • Vector Store
    • Object Storage
  • User Interface
    • Workbench
  • Observability
    • Observability
  • Reference
    • Troubleshooting
LogoLogo
Contact usJoin the Discord
On this page
  • Capabilities with PrivateGPT
  • Setup
  • Advanced profile example
  • Troubleshooting
Inference Providers

LM Studio

Was this page helpful?
Previous

LlamaCPP Server

Next
Built with

LM Studio is a desktop application for discovering, downloading, and running GGUF models locally. Its built-in local server exposes an OpenAI-compatible API with full tokenizer support.

Capabilities with PrivateGPT

CapabilityStatus
Model discovery (/v1/models)✅
Tokenizer endpoint (/tokenize)✅
Embeddings✅
Tool / function calling✅ model-dependent
Structured output❌
Streaming✅
Vision / image input✅ model-dependent

Setup

1

Install LM Studio

Download and install from lmstudio.ai. Available for macOS, Windows, and Linux.

2

Download models

  1. Open LM Studio.
  2. Go to the Discover tab (magnifying glass icon).
  3. Search for a model. Example:
    • LLM: search unsloth Qwen3.5-35B-A3B and pick a Q4 quantization (~18 GB)
    • Embeddings: search mxbai-embed-large
  4. Click the model and select a quantization (Q4_K_M is a good default).
  5. Click Download.
3

Start the local server

  1. Click the Developer tab (left sidebar, </> icon).
  2. Select your downloaded model from the dropdown.
  3. Click Start Server.

The default server address is http://localhost:1234.

To serve an embeddings model simultaneously, scroll down in the Developer panel and load a second model under “Embedding model”.

4

Run PrivateGPT

Package install
Docker
uv (local)
$OPENAI_API_BASE=http://localhost:1234/v1 private-gpt serve

Advanced profile example

1# settings-model.yaml
2llm:
3 default_model: qwen3-35b-a3b-q4_k_m
4
5embedding:
6 default_model: mxbai-embed-large-v1
7
8models:
9 - name: qwen3-35b-a3b-q4_k_m
10 type: llm
11 mode: openai
12 context_window: 32768
13 tokenizer: Qwen/Qwen3.5-35B-A3B
14 support_tools: true
15 support_reasoning: true
16 sampling_params:
17 temperature: 0.6
18 top_p: 0.95
19 top_k: 20
20 min_p: 0.0
21
22 - name: mxbai-embed-large-v1
23 type: embedding
24 mode: openai
25 context_window: 512

Generate this automatically (with LM Studio server running):

$OPENAI_API_BASE=http://localhost:1234/v1 \
> uv run python scripts/auto_discover_models.py --out settings-model.yaml

Troubleshooting

CORS errors from the browser

Enable CORS in LM Studio: Developer → Server Settings → Enable CORS.

Model name doesn’t match

LM Studio uses the file name as the model ID. Check the exact name with:

$curl http://localhost:1234/v1/models

Use the id field from the response as the model name in your profile.