For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Contact usJoin the Discord
ManualAPI GuideAPI Reference
  • Getting started
    • Introduction
    • Quickstart
    • How it works
  • Installation Options
    • Package Install
    • Docker
    • Development
  • Configuration
    • CLI
    • Settings & Profiles
    • Model Configuration
  • Inference Providers
    • Overview
    • Ollama
    • LM Studio
    • LlamaCPP Server
    • vLLM
  • Integrations
    • Overview
    • Claude Code
    • Claude Desktop
    • Claude for Microsoft 365
    • OpenCode
  • Built-in Tools
    • Web Tools
    • Database Tools
  • Storage Providers
    • Vector Store
    • Object Storage
  • User Interface
    • Workbench
  • Observability
    • Observability
  • Reference
    • Troubleshooting
LogoLogo
Contact usJoin the Discord
On this page
  • What’s next?
Getting started

Quickstart

Was this page helpful?
Previous

How it works

Next
Built with

PrivateGPT connects to any OpenAI-compatible LLM server and exposes a private, self-hosted AI API. This guide gets you from zero to a running server in four steps.

Prerequisites: You need an OpenAI-compatible LLM server running locally. Pick one from the Providers page — Ollama is the easiest way to start.

1

Install PrivateGPT

Windows
Linux
macOS
1# Install uv first
2powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
3
4# Then install PrivateGPT
5uv tool install `
6 --find-links https://zylon-ai.github.io/private-gpt/packages/ `
7 "private-gpt[core]"
2

Start your LLM server

Start your server. PrivateGPT auto-discovers all available models on startup.

Ollama
LM Studio
LlamaCPP Server
vLLM
$# Example: pull a model and start the server
$ollama pull qwen3.5:35b # LLM (~24 GB)
$ollama pull mxbai-embed-large # Embeddings (~670 MB)
$
$# Start the server (runs on port 11434)
$ollama serve

Ollama does not expose a tokenizer endpoint. PrivateGPT falls back to approximate token counting, which may affect context-window management. See Ollama limitations.

3

Run PrivateGPT

Point PrivateGPT at your servers with OPENAI_API_BASE and OPENAI_EMBEDDING_API_BASE. Models are discovered automatically — no config file needed.

macOS / Linux
Windows (PowerShell)
Windows (CMD)
$OPENAI_API_BASE=http://localhost:<llm-port>/v1 \
> OPENAI_EMBEDDING_API_BASE=http://localhost:<embedding-port>/v1 \
> private-gpt serve

If startup succeeds, PrivateGPT will be available on port 8080.

4

Open the UI

Navigate to http://localhost:8080/ui in your browser.

The API is available at http://localhost:8080 and follows the Anthropic API spec. See the API Reference for all endpoints.


What’s next?

Docker install

Run PrivateGPT with Docker for a fully isolated, production-ready setup.

Local with uv

Install from source with core, add extras only when needed, and use detailed model configuration.

Inference Providers

Compare Ollama, LM Studio, LlamaCPP, and vLLM — feature matrix and limitations.

API Reference

Explore all REST endpoints and start building your application.