Installation | PrivateGPT

It is important that you review the Main Concepts section to understand the different components of PrivateGPT and how they interact with each other.

Base requirements to run PrivateGPT

1. Clone the PrivateGPT Repository

Clone the repository and navigate to it:

$ git clone https://github.com/zylon-ai/private-gpt
> cd private-gpt

2. Install Python 3.11

If you do not have Python 3.11 installed, install it using a Python version manager like pyenv. Earlier Python versions are not supported.

macOS/Linux

Install and set Python 3.11 using pyenv:

$ pyenv install 3.11
> pyenv local 3.11

Windows

Install and set Python 3.11 using pyenv-win:

$ pyenv install 3.11
> pyenv local 3.11

3. Install `Poetry`

Install Poetry for dependency management: Follow the instructions on the official Poetry website to install it.

A bug exists in Poetry versions 1.7.0 and earlier. We strongly recommend upgrading to a tested version. To upgrade Poetry to latest tested version, run poetry self update 1.8.3 after installing it.

4. Optional: Install `make`

To run various scripts, you need to install make. Follow the instructions for your operating system:

macOS

(Using Homebrew):

$ brew install make

Windows

(Using Chocolatey):

$ choco install make

Install and Run Your Desired Setup

PrivateGPT allows customization of the setup, from fully local to cloud-based, by deciding the modules to use. To install only the required dependencies, PrivateGPT offers different extras that can be combined during the installation process:

$ poetry install --extras "<extra1> <extra2>..."

Where <extra> can be any of the following options described below.

Available Modules

You need to choose one option per category (LLM, Embeddings, Vector Stores, UI). Below are the tables listing the available options for each category.

LLM

Option	Description	Extra
ollama	Adds support for Ollama LLM, requires Ollama running locally	llms-ollama
llama-cpp	Adds support for local LLM using LlamaCPP	llms-llama-cpp
sagemaker	Adds support for Amazon Sagemaker LLM, requires Sagemaker endpoints	llms-sagemaker
openai	Adds support for OpenAI LLM, requires OpenAI API key	llms-openai
openailike	Adds support for 3rd party LLM providers compatible with OpenAI’s API	llms-openai-like
azopenai	Adds support for Azure OpenAI LLM, requires Azure endpoints	llms-azopenai
gemini	Adds support for Gemini LLM, requires Gemini API key	llms-gemini

Embeddings

Option	Description	Extra
ollama	Adds support for Ollama Embeddings, requires Ollama running locally	embeddings-ollama
huggingface	Adds support for local Embeddings using HuggingFace	embeddings-huggingface
openai	Adds support for OpenAI Embeddings, requires OpenAI API key	embeddings-openai
sagemaker	Adds support for Amazon Sagemaker Embeddings, requires Sagemaker endpoints	embeddings-sagemaker
azopenai	Adds support for Azure OpenAI Embeddings, requires Azure endpoints	embeddings-azopenai
gemini	Adds support for Gemini Embeddings, requires Gemini API key	embeddings-gemini

Vector Stores

Option	Description	Extra
qdrant	Adds support for Qdrant vector store	vector-stores-qdrant
milvus	Adds support for Milvus vector store	vector-stores-milvus
chroma	Adds support for Chroma DB vector store	vector-stores-chroma
postgres	Adds support for Postgres vector store	vector-stores-postgres
clickhouse	Adds support for Clickhouse vector store	vector-stores-clickhouse

UI

Option	Description	Extra
Gradio	Adds support for UI using Gradio	ui

A working Gradio UI client is provided to test the API, together with a set of useful tools such as bulk model download script, ingestion script, documents folder watch, etc. Please refer to the UI alternatives page for more UI alternatives.

Recommended Setups

There are just some examples of recommended setups. You can mix and match the different options to fit your needs. You’ll find more information in the Manual section of the documentation.

Important for Windows: In the examples below or how to run PrivateGPT with make run, PGPT_PROFILES env var is being set inline following Unix command line syntax (works on MacOS and Linux). If you are using Windows, you’ll need to set the env var in a different way, for example:

1 # Powershell
2 $env:PGPT_PROFILES="ollama"
3 make run

1 # CMD
2 set PGPT_PROFILES=ollama
3 make run

Refer to the troubleshooting section for specific issues you might encounter.

Local, Ollama-powered setup - RECOMMENDED

The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It’s the recommended setup for local development.

Go to ollama.ai and follow the instructions to install Ollama on your machine.

After the installation, make sure the Ollama desktop app is closed.

Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):

$ ollama serve

Install the models to be used, the default settings-ollama.yaml is configured to user llama3.1 8b LLM (~4GB) and nomic-embed-text Embeddings (~275MB)

By default, PGPT will automatically pull models as needed. This behavior can be changed by modifying the ollama.autopull_models property.

In any case, if you want to manually pull models, run the following commands:

$ ollama pull llama3.1
> ollama pull nomic-embed-text

Once done, on a different terminal, you can install PrivateGPT with the following command:

$ poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"

Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.

$ PGPT_PROFILES=ollama make run

PrivateGPT will use the already existing settings-ollama.yaml settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)

The UI will be available at http://localhost:8001

Private, Sagemaker-powered setup

If you need more performance, you can run a version of PrivateGPT that relies on powerful AWS Sagemaker machines to serve the LLM and Embeddings.

You need to have access to sagemaker inference endpoints for the LLM and / or the embeddings, and have AWS credentials properly configured.

Edit the settings-sagemaker.yaml file to include the correct Sagemaker endpoints.

Then, install PrivateGPT with the following command:

$ poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"

Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.

$ PGPT_PROFILES=sagemaker make run

PrivateGPT will use the already existing settings-sagemaker.yaml settings file, which is already configured to use Sagemaker LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

Non-Private, OpenAI-powered test setup

If you want to test PrivateGPT with OpenAI’s LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:

You need an OPENAI API key to run this setup.

Edit the settings-openai.yaml file to include the correct API KEY. Never commit it! It’s a secret! As an alternative to editing settings-openai.yaml, you can just set the env var OPENAI_API_KEY.

Then, install PrivateGPT with the following command:

$ poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"

Once installed, you can run PrivateGPT.

$ PGPT_PROFILES=openai make run

PrivateGPT will use the already existing settings-openai.yaml settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

Non-Private, Azure OpenAI-powered test setup

If you want to test PrivateGPT with Azure OpenAI’s LLM and Embeddings -taking into account your data is going to Azure OpenAI!- you can run the following command:

You need to have access to Azure OpenAI inference endpoints for the LLM and / or the embeddings, and have Azure OpenAI credentials properly configured.

Edit the settings-azopenai.yaml file to include the correct Azure OpenAI endpoints.

Then, install PrivateGPT with the following command:

$ poetry install --extras "ui llms-azopenai embeddings-azopenai vector-stores-qdrant"

Once installed, you can run PrivateGPT.

$ PGPT_PROFILES=azopenai make run

PrivateGPT will use the already existing settings-azopenai.yaml settings file, which is already configured to use Azure OpenAI LLM and Embeddings endpoints, and Qdrant.

The UI will be available at http://localhost:8001

Local, Llama-CPP powered setup

If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:

$ poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"

In order for local LLM and embeddings to work, you need to download the models to the models folder. You can do so by running the setup script:

$ poetry run python scripts/setup

Once installed, you can run PrivateGPT with the following command:

$ PGPT_PROFILES=local make run

PrivateGPT will load the already existing settings-local.yaml file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.

The UI will be available at http://localhost:8001

Llama-CPP support

For PrivateGPT to run fully locally without Ollama, Llama.cpp is required and in particular llama-cpp-python is used.

You’ll need to have a valid C++ compiler like gcc installed. See Troubleshooting: C++ Compiler for more details.

It’s highly encouraged that you fully read llama-cpp and llama-cpp-python documentation relevant to your platform. Running into installation issues is very likely, and you’ll need to troubleshoot them yourself.

Llama-CPP OSX GPU support

You will need to build llama.cpp with metal support.

To do that, you need to install llama.cpp python’s binding llama-cpp-python through pip, with the compilation flag that activate METAL: you have to pass -DLLAMA_METAL=on to the CMake command tha pip runs for you (see below).

In other words, one should simply run:

$ CMAKE_ARGS="-DLLAMA_METAL=on" pip install --force-reinstall --no-cache-dir llama-cpp-python

The above command will force the re-installation of llama-cpp-python with METAL support by compiling llama.cpp locally with your METAL libraries (shipped by default with your macOS).

More information is available in the documentation of the libraries themselves:

Llama-CPP Windows NVIDIA GPU support

Windows GPU support is done through CUDA. Follow the instructions on the original llama.cpp repo to install the required dependencies.

Some tips to get it working with an NVIDIA card and CUDA (Tested on Windows 10 with CUDA 11.5 RTX 3070):

Install latest VS2022 (and build tools) https://visualstudio.microsoft.com/vs/community/
Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
Verify your installation is correct by running nvcc --version and nvidia-smi, ensure your CUDA version is up to date and your GPU is detected.
[Optional] Install CMake to troubleshoot building issues by compiling llama.cpp directly https://cmake.org/download/

If you have all required dependencies properly configured running the following powershell command should succeed.

1 $env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0

If your installation was correct, you should see a message similar to the following next time you start the server BLAS = 1. If there is some issue, please refer to the troubleshooting section.

1 llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
2 AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |

Note that llama.cpp offloads matrix calculations to the GPU but the performance is still hit heavily due to latency between CPU and GPU communication. You might need to tweak batch sizes and other parameters to get the best performance for your particular system.

Llama-CPP Linux NVIDIA GPU support and Windows-WSL

Linux GPU support is done through CUDA. Follow the instructions on the original llama.cpp repo to install the required external dependencies.

Some tips:

Make sure you have an up-to-date C++ compiler
Install CUDA toolkit https://developer.nvidia.com/cuda-downloads
Verify your installation is correct by running nvcc --version and nvidia-smi, ensure your CUDA version is up to date and your GPU is detected.

After that running the following command in the repository will install llama.cpp with GPU support:

$ CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0

If your installation was correct, you should see a message similar to the following next time you start the server BLAS = 1. If there is some issue, please refer to the troubleshooting section.

llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |

Llama-CPP Linux AMD GPU support

Linux GPU support is done through ROCm. Some tips:

Install ROCm from quick-start install guide
Install PyTorch for ROCm

$ wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.0/torch-2.1.1%2Brocm6.0-cp311-cp311-linux_x86_64.whl
> poetry run pip install --force-reinstall --no-cache-dir torch-2.1.1+rocm6.0-cp311-cp311-linux_x86_64.whl

Install bitsandbytes for ROCm

$ PYTORCH_ROCM_ARCH=gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx940,gfx941,gfx942
> BITSANDBYTES_VERSION=62353b0200b8557026c176e74ac48b84b953a854
> git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
> cd bitsandbytes-rocm-5.6
> git checkout ${BITSANDBYTES_VERSION}
> make hip ROCM_TARGET=${PYTORCH_ROCM_ARCH} ROCM_HOME=/opt/rocm/
> pip install . --extra-index-url https://download.pytorch.org/whl/nightly

After that running the following command in the repository will install llama.cpp with GPU support:

$ LLAMA_CPP_PYTHON_VERSION=0.2.56
> DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942
> CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DAMDGPU_TARGETS=${DAMDGPU_TARGETS}" poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==${LLAMA_CPP_PYTHON_VERSION}

If your installation was correct, you should see a message similar to the following next time you start the server BLAS = 1.

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |

Llama-CPP Known issues and Troubleshooting

Execution of LLMs locally still has a lot of sharp edges, specially when running on non Linux platforms. You might encounter several issues:

Performance: RAM or VRAM usage is very high, your computer might experience slowdowns or even crashes.
GPU Virtualization on Windows and OSX: Simply not possible with docker desktop, you have to run the server directly on the host.
Building errors: Some of PrivateGPT dependencies need to build native code, and they might fail on some platforms. Most likely you are missing some dev tools in your machine (updated C++ compiler, CUDA is not on PATH, etc.). If you encounter any of these issues, please open an issue and we’ll try to help.

One of the first reflex to adopt is: get more information. If, during your installation, something does not go as planned, retry in verbose mode, and see what goes wrong.

For example, when installing packages with pip install, you can add the option -vvv to show the details of the installation.

Llama-CPP Troubleshooting: C++ Compiler

If you encounter an error while building a wheel during the pip install process, you may need to install a C++ compiler on your computer.

For Windows 10/11

To install a C++ compiler on Windows 10/11, follow these steps:

Install Visual Studio 2022.
Make sure the following components are selected:
- Universal Windows Platform development
- C++ CMake tools for Windows
Download the MinGW installer from the MinGW website.
Run the installer and select the gcc component.

For OSX

Check if you have a C++ compiler installed, Xcode should have done it for you. To install Xcode, go to the App Store and search for Xcode and install it. Or you can install the command line tools by running xcode-select --install.
If not, you can install clang or gcc with homebrew brew install gcc

Llama-CPP Troubleshooting: Mac Running Intel

When running a Mac with Intel hardware (not M1), you may run into clang: error: the clang compiler does not support ’ -march=native’ during pip install.

If so set your archflags during pip install. eg: ARCHFLAGS=“-arch x86_64” pip3 install -r requirements.txt

Base requirements to run PrivateGPT

1. Clone the PrivateGPT Repository

2. Install Python 3.11

macOS/Linux

Windows

3. Install Poetry

4. Optional: Install make

macOS

Windows

Install and Run Your Desired Setup

Available Modules

LLM

Embeddings

Vector Stores

UI

Recommended Setups

Local, Ollama-powered setup - RECOMMENDED

Private, Sagemaker-powered setup

Non-Private, OpenAI-powered test setup

Non-Private, Azure OpenAI-powered test setup

Local, Llama-CPP powered setup

Llama-CPP support

Llama-CPP OSX GPU support

Llama-CPP Windows NVIDIA GPU support

Llama-CPP Linux NVIDIA GPU support and Windows-WSL

Llama-CPP Linux AMD GPU support

Llama-CPP Known issues and Troubleshooting

Llama-CPP Troubleshooting: C++ Compiler

Llama-CPP Troubleshooting: Mac Running Intel

3. Install `Poetry`

4. Optional: Install `make`