LM Studio
LM Studio is a desktop application for discovering, downloading, and running GGUF models locally. Its built-in local server exposes an OpenAI-compatible API with full tokenizer support.
Capabilities with PrivateGPT
Setup
Download models
- Open LM Studio.
- Go to the Discover tab (magnifying glass icon).
- Search for a model. Example:
- LLM: search
unsloth Qwen3.5-35B-A3Band pick a Q4 quantization (~18 GB) - Embeddings: search
mxbai-embed-large
- LLM: search
- Click the model and select a quantization (Q4_K_M is a good default).
- Click Download.
Start the local server
- Click the Developer tab (left sidebar,
</>icon). - Select your downloaded model from the dropdown.
- Click Start Server.
The default server address is http://localhost:1234.
To serve an embeddings model simultaneously, scroll down in the Developer panel and load a second model under “Embedding model”.
Advanced profile example
Generate this automatically (with LM Studio server running):
Troubleshooting
CORS errors from the browser
Enable CORS in LM Studio: Developer → Server Settings → Enable CORS.
Model name doesn’t match
LM Studio uses the file name as the model ID. Check the exact name with:
Use the id field from the response as the model name in your profile.

