LM Studio is a desktop application for discovering, downloading, and running GGUF models locally. Its built-in local server exposes an OpenAI-compatible API with full tokenizer support.
unsloth Qwen3.5-35B-A3B and pick a Q4 quantization (~18 GB)mxbai-embed-large</> icon).The default server address is http://localhost:1234.
To serve an embeddings model simultaneously, scroll down in the Developer panel and load a second model under “Embedding model”.
Generate this automatically (with LM Studio server running):
CORS errors from the browser
Enable CORS in LM Studio: Developer → Server Settings → Enable CORS.
Model name doesn’t match
LM Studio uses the file name as the model ID. Check the exact name with:
Use the id field from the response as the model name in your profile.