Sources

Upload files, add web pages, recrawl, replace, and watch the build pipeline turn them into a searchable knowledge base.

Last updated 2026-05-23

Sources are the raw material your assistant reasons over. Any file or web page you add gets parsed, chunked, and embedded so the AI fallback can retrieve relevant passages in answer to visitor questions.

Upload files

Drop one or more files into the dropzone at the top of the page, or click it to open a file picker.

Accepted file types

.pdf — text-based PDFs (scanned-image PDFs need OCR first).
.docx, .doc — Microsoft Word documents.
.md, .txt — Markdown and plain text.
.csv, .xlsx — tabular data (each row becomes a chunk).

Limits

Max file size: 5 MB per file. Larger files are rejected with an inline error.
Max files per assistant: 30 (warn at 20).
Max total source size: 100 MB across all sources (warn at 60 MB).
Max chunks per assistant: 2 000 (warn at 1 000).

The embed downloads the resulting vectors.json on first visit, so keeping the total reasonable also keeps first-load fast.

Add a web page

Paste a URL into the URL field and click Add. The crawler fetches the page, strips navigation and chrome, and extracts the main content. Crawled pages behave just like uploaded files from then on (they have chunks, a status, can be recrawled, etc.).

One URL at a time

The crawler doesn't follow links. To index multiple pages, add each URL separately. Sitemap crawling is on the roadmap.

The sources list

Each row shows:

Name — filename or URL.
Type — file extension or web page.
Status — see below.
Chunks — passages extracted from this source.
Size — original byte count.
Created — relative timestamp.
Actions — view, recrawl, replace, delete.

Per-source statuses

Status	Meaning
`queued`	Waiting in the build queue.
`processing`	Currently being parsed or embedded.
`ready`	Indexed and available to answer queries.
`error`	Failed to parse — hover the row for details.

Status updates live; you do not need to refresh the page.

Per-source actions

View — opens a modal with the extracted text. Useful when results aren't what you expected — you can see exactly what the model has.
Recrawl — re-downloads the URL or re-parses the file with the latest extractor. Use after fixing a malformed PDF or changing a web page.
Replace — keep the same row but swap the underlying file. Useful when you publish a new version of a doc and want to keep its history.
Delete — removes the source and triggers a rebuild.

The build pipeline

Adding, replacing, or deleting a source enqueues a build. See Assistant overview for the full stage list. The progress bar at the top of the sources page (and on the overview tab) shows current stage and percent.

Small, focused sources work best

The AI fallback retrieves a handful of chunks per question. Many short, well-titled documents beat one giant PDF, because the retriever can match the right passage more precisely.