Sources
Upload files, add web pages, recrawl, replace, and watch the build pipeline turn them into a searchable knowledge base.
Last updated 2026-05-23
Sources are the raw material your assistant reasons over. Any file or web page you add gets parsed, chunked, and embedded so the AI fallback can retrieve relevant passages in answer to visitor questions.
Upload files
Drop one or more files into the dropzone at the top of the page, or click it to open a file picker.
Accepted file types
.pdf— text-based PDFs (scanned-image PDFs need OCR first)..docx,.doc— Microsoft Word documents..md,.txt— Markdown and plain text..csv,.xlsx— tabular data (each row becomes a chunk).
Limits
- Max file size: 5 MB per file. Larger files are rejected with an inline error.
- Max files per assistant: 30 (warn at 20).
- Max total source size: 100 MB across all sources (warn at 60 MB).
- Max chunks per assistant: 2 000 (warn at 1 000).
The embed downloads the resulting vectors.json on first visit, so keeping the total reasonable also keeps first-load fast.
Add a web page
Paste a URL into the URL field and click Add. The crawler fetches the page, strips navigation and chrome, and extracts the main content. Crawled pages behave just like uploaded files from then on (they have chunks, a status, can be recrawled, etc.).
The sources list
Each row shows:
- Name — filename or URL.
- Type — file extension or web page.
- Status — see below.
- Chunks — passages extracted from this source.
- Size — original byte count.
- Created — relative timestamp.
- Actions — view, recrawl, replace, delete.
Per-source statuses
| Status | Meaning |
|---|---|
queued | Waiting in the build queue. |
processing | Currently being parsed or embedded. |
ready | Indexed and available to answer queries. |
error | Failed to parse — hover the row for details. |
Status updates live; you do not need to refresh the page.
Per-source actions
- View — opens a modal with the extracted text. Useful when results aren't what you expected — you can see exactly what the model has.
- Recrawl — re-downloads the URL or re-parses the file with the latest extractor. Use after fixing a malformed PDF or changing a web page.
- Replace — keep the same row but swap the underlying file. Useful when you publish a new version of a doc and want to keep its history.
- Delete — removes the source and triggers a rebuild.
The build pipeline
Adding, replacing, or deleting a source enqueues a build. See Assistant overview for the full stage list. The progress bar at the top of the sources page (and on the overview tab) shows current stage and percent.