Downloading Gated and Private Models
Many models are gated or private, requiring special access to use them. Follow these steps to gain access and set up your environment for using these models.
Accessing Gated Models
- Request Access: Follow the instructions provided here to request access to the gated model.
- Generate a Token: Once you have access, generate a token by following the instructions here.
- Set the Token:
Add the generated token to your
settings.yamlfile:Alternatively, set theHF_TOKENenvironment variable:
Tokenizer Setup
PrivateGPT uses the AutoTokenizer library to tokenize input text accurately. It connects to HuggingFace’s API to download the appropriate tokenizer for the specified model.
Configuring the Tokenizer
- Specify the Model:
In your
settings.yamlfile, specify the model you want to use: - Set Access Token for Gated Models:
If you are using a gated model, ensure the
access_tokenis set as mentioned in the previous section. This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.
Embedding dimensions mismatch
If you encounter an error message like Embedding dimensions mismatch, it is likely due to the embedding model and
current vector dimension mismatch. To resolve this issue, ensure that the model and the input data have the same vector dimensions.
By default, PrivateGPT uses nomic-embed-text embeddings, which have a vector dimension of 768.
If you are using a different embedding model, ensure that the vector dimensions match the model’s output.
In versions below to 0.6.0, the default embedding model was BAAI/bge-small-en-v1.5 in huggingface setup.
If you plan to reuse the old generated embeddings, you need to update the settings.yaml file to use the correct embedding model:
Building Llama-cpp with NVIDIA GPU support
Out-of-memory error
If you encounter an out-of-memory error while running llama-cpp with CUDA, you can try the following steps to resolve the issue:
- Set the next environment:
- Run PrivateGPT:
Give thanks to MarioRossiGithub for providing the following solution.

