Many models are gated or private, requiring special access to use them. Follow these steps to gain access and set up your environment for using these models.

Accessing Gated Models

Request Access: Follow the instructions provided here to request access to the gated model.
Generate a Token: Once you have access, generate a token by following the instructions here.

Set the Token: Add the generated token to your settings.yaml file:

1 huggingface:
2   access_token: <your-token>

Alternatively, set the HF_TOKEN environment variable:

$ export HF_TOKEN=<your-token>

Tokenizer Setup

PrivateGPT uses the AutoTokenizer library to tokenize input text accurately. It connects to HuggingFace’s API to download the appropriate tokenizer for the specified model.

Configuring the Tokenizer

Specify the Model: In your settings.yaml file, specify the model you want to use:

1 llm:
2   tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct

Set Access Token for Gated Models: If you are using a gated model, ensure the access_token is set as mentioned in the previous section. This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with.

Embedding dimensions mismatch

If you encounter an error message like Embedding dimensions mismatch, it is likely due to the embedding model and current vector dimension mismatch. To resolve this issue, ensure that the model and the input data have the same vector dimensions.

By default, PrivateGPT uses nomic-embed-text embeddings, which have a vector dimension of 768. If you are using a different embedding model, ensure that the vector dimensions match the model’s output.

In versions below to 0.6.0, the default embedding model was BAAI/bge-small-en-v1.5 in huggingface setup. If you plan to reuse the old generated embeddings, you need to update the settings.yaml file to use the correct embedding model:

1 huggingface:
2   embedding_hf_model_name: BAAI/bge-small-en-v1.5
3 embedding:
4   embed_dim: 384

Building Llama-cpp with NVIDIA GPU support

Out-of-memory error

If you encounter an out-of-memory error while running llama-cpp with CUDA, you can try the following steps to resolve the issue:

Set the next environment:

$ TOKENIZERS_PARALLELISM=true

Run PrivateGPT:

$ poetry run python -m privategpt

Give thanks to MarioRossiGithub for providing the following solution.