Vector Stores

Vectorstores

PrivateGPT supports Qdrant, Milvus, Chroma, PGVector and ClickHouse as vectorstore providers. Qdrant being the default.

In order to select one or the other, set the vectorstore.database property in the settings.yaml file to qdrant, milvus, chroma, postgres and clickhouse.

1vectorstore:
2 database: qdrant

Qdrant configuration

To enable Qdrant, set the vectorstore.database property in the settings.yaml file to qdrant.

Qdrant settings can be configured by setting values to the qdrant property in the settings.yaml file.

The available configuration options are:

FieldDescription
locationIf :memory: - use in-memory Qdrant instance. If str - use it as a url parameter.
urlEither host or str of ‘Optional[scheme], host, Optional[port], Optional[prefix]‘. Eg. http://localhost:6333
portPort of the REST API interface. Default: 6333
grpc_portPort of the gRPC interface. Default: 6334
prefer_grpcIf true - use gRPC interface whenever possible in custom methods.
httpsIf true - use HTTPS(SSL) protocol.
api_keyAPI key for authentication in Qdrant Cloud.
prefixIf set, add prefix to the REST URL path. Example: service/v1 will result in http://localhost:6333/service/v1/{qdrant-endpoint} for REST API.
timeoutTimeout for REST and gRPC API requests. Default: 5.0 seconds for REST and unlimited for gRPC
hostHost name of Qdrant service. If url and host are not set, defaults to ‘localhost’.
pathPersistence path for QdrantLocal. Eg. local_data/private_gpt/qdrant
force_disable_check_same_threadForce disable check_same_thread for QdrantLocal sqlite connection, defaults to True.

By default Qdrant tries to connect to an instance of Qdrant server at http://localhost:3000.

To obtain a local setup (disk-based database) without running a Qdrant server, configure the qdrant.path value in settings.yaml:

1qdrant:
2 path: local_data/private_gpt/qdrant

Milvus configuration

To enable Milvus, set the vectorstore.database property in the settings.yaml file to milvus and install the milvus extra.

$poetry install --extras vector-stores-milvus

The available configuration options are:

FieldDescription
uriDefault is set to “local_data/private_gpt/milvus/milvus_local.db” as a local file; you can also set up a more performant Milvus server on docker or k8s e.g.http://localhost:19530, as your uri; To use Zilliz Cloud, adjust the uri and token to Endpoint and Api key in Zilliz Cloud.
tokenPair with Milvus server on docker or k8s or zilliz cloud api key.
collection_nameThe name of the collection, set to default “milvus_db”.
overwriteOverwrite the data in collection if it existed, set to default as True.

To obtain a local setup (disk-based database) without running a Milvus server, configure the uri value in settings.yaml, to store in local_data/private_gpt/milvus/milvus_local.db.

Chroma configuration

To enable Chroma, set the vectorstore.database property in the settings.yaml file to chroma and install the chroma extra.

$poetry install --extras chroma

By default chroma will use a disk-based database stored in local_data_path / “chroma_db” (being local_data_path defined in settings.yaml)

PGVector

To use the PGVector store a postgreSQL database with the PGVector extension must be used.

To enable PGVector, set the vectorstore.database property in the settings.yaml file to postgres and install the vector-stores-postgres extra.

$poetry install --extras vector-stores-postgres

PGVector settings can be configured by setting values to the postgres property in the settings.yaml file.

The available configuration options are:

FieldDescription
hostThe server hosting the Postgres database. Default is localhost
portThe port on which the Postgres database is accessible. Default is 5432
databaseThe specific database to connect to. Default is postgres
userThe username for database access. Default is postgres
passwordThe password for database access. (Required)
schema_nameThe database schema to use. Default is private_gpt

For example:

1vectorstore:
2 database: postgres
3
4postgres:
5 host: localhost
6 port: 5432
7 database: postgres
8 user: postgres
9 password: <PASSWORD>
10 schema_name: private_gpt

The following table will be created in the database

postgres=# \d private_gpt.data_embeddings
Table "private_gpt.data_embeddings"
Column | Type | Collation | Nullable | Default
-----------+-------------------+-----------+----------+---------------------------------------------------------
id | bigint | | not null | nextval('private_gpt.data_embeddings_id_seq'::regclass)
text | character varying | | not null |
metadata_ | json | | |
node_id | character varying | | |
embedding | vector(768) | | |
Indexes:
"data_embeddings_pkey" PRIMARY KEY, btree (id)
postgres=#

The dimensions of the embeddings columns will be set based on the embedding.embed_dim value. If the embedding model changes this table may need to be dropped and recreated to avoid a dimension mismatch.

ClickHouse

To utilize ClickHouse as the vector store, a ClickHouse database must be employed.

To enable ClickHouse, set the vectorstore.database property in the settings.yaml file to clickhouse and install the vector-stores-clickhouse extra.

$poetry install --extras vector-stores-clickhouse

ClickHouse settings can be configured by setting values to the clickhouse property in the settings.yaml file.

The available configuration options are:

FieldDescription
hostThe server hosting the ClickHouse database. Default is localhost
portThe port on which the ClickHouse database is accessible. Default is 8123
usernameThe username for database access. Default is default
passwordThe password for database access. (Optional)
databaseThe specific database to connect to. Default is __default__
secureUse https/TLS for secure connection to the server. Default is false
interfaceThe protocol used for the connection, either ‘http’ or ‘https’. (Optional)
settingsSpecific ClickHouse server settings to be used with the session. (Optional)
connect_timeoutTimeout in seconds for establishing a connection. (Optional)
send_receive_timeoutRead timeout in seconds for http connection. (Optional)
verifyVerify the server certificate in secure/https mode. (Optional)
ca_certPath to Certificate Authority root certificate (.pem format). (Optional)
client_certPath to TLS Client certificate (.pem format). (Optional)
client_cert_keyPath to the private key for the TLS Client certificate. (Optional)
http_proxyHTTP proxy address. (Optional)
https_proxyHTTPS proxy address. (Optional)
server_host_nameServer host name to be checked against the TLS certificate. (Optional)

For example:

1vectorstore:
2 database: clickhouse
3
4clickhouse:
5 host: localhost
6 port: 8443
7 username: admin
8 password: <PASSWORD>
9 database: embeddings
10 secure: false

The following table will be created in the database:

clickhouse-client
:) \d embeddings.llama_index
Table "llama_index"
№ | name | type | default_type | default_expression | comment | codec_expression | ttl_expression
----|-----------|----------------------------------------------|--------------|--------------------|---------|------------------|---------------
1 | id | String | | | | |
2 | doc_id | String | | | | |
3 | text | String | | | | |
4 | vector | Array(Float32) | | | | |
5 | node_info | Tuple(start Nullable(UInt64), end Nullable(UInt64)) | | | | |
6 | metadata | String | | | | |
clickhouse-client

The dimensions of the embeddings columns will be set based on the embedding.embed_dim value. If the embedding model changes, this table may need to be dropped and recreated to avoid a dimension mismatch.