Ingestion

Documents are ingested through /v1/artifacts/ingest. Once ingested they are chunked, embedded, and stored in the vector store for retrieval.


Ingest a file

$curl -X POST http://localhost:8080/v1/artifacts/ingest \
> -F "file=@/path/to/document.pdf"

Target a specific collection:

$curl -X POST http://localhost:8080/v1/artifacts/ingest \
> -F "file=@/path/to/document.pdf" \
> -F "collection=my-collection"

List ingested documents

$curl "http://localhost:8080/v1/artifacts/list?collection=my-collection"

Delete a document

$curl -X POST http://localhost:8080/v1/artifacts/delete \
> -H "Content-Type: application/json" \
> -d '{"collection": "my-collection", "artifact": "<artifact-id>"}'

Wipe all local data

$make wipe

This deletes everything under PGPT_HOME/local_data/ (default ~/.local/share/private-gpt/local_data/) including the vector store. It cannot be undone.


Bulk local ingestion

To ingest an entire folder from the command line, enable local ingestion in your settings:

1data:
2 local_ingestion:
3 enabled: true
4 allow_ingest_from: ["*"]

Then run:

$make ingest /path/to/folder

Watch mode (re-ingest on file changes):

$make ingest /path/to/folder -- --watch

Supported file formats

PrivateGPT handles plain text natively. The following formats are also supported with built-in parsers:

.pdf · .docx · .pptx · .ppt · .pptm · .hwp · .epub · .md · .csv · .json · .ipynb · .mbox · .jpg · .jpeg · .png · .mp3 · .mp4

Any other file type is read as plain text.