Document Storage
Hyphen provides built-in document storage so workflows can operate on uploaded files rather than requiring data to be embedded in API payloads. Upload a CSV, PDF, or JSON file once, get back a document_id, and reference it anywhere in your workflow using the doc: prefix.
This separates data ingestion from workflow execution ā no base64-encoded payloads, no external URL fetching at runtime, no SSRF risk.
Uploading Documents
Upload via multipart form data:
curl -X POST https://apisvr.tryhyphen.com/documents \
-H "X-Org-Id: acme-corp" \
-F "file=@transactions_q4.csv" \
-F 'name=Q4 Transactions' \
-F 'tags=["finance", "quarterly"]' \
-F 'metadata={"department": "accounting"}' \
-F 'ttl_days=90'
Response:
{
"document": {
"id": "doc_a1b2c3d4e5f6",
"name": "Q4 Transactions",
"content_type": "text/csv",
"size_bytes": 245890,
"checksum_sha256": "e3b0c44298fc...",
"current_version": 1,
"status": "ready",
"tags": ["finance", "quarterly"],
"created_at": "2026-02-01T00:00:00Z"
}
}
Supported Content Types
| Content Type | Extensions | Max Size |
|---|---|---|
| CSV | .csv |
50 MB |
| JSON | .json |
50 MB |
| Plain text | .txt |
50 MB |
.pdf |
50 MB | |
| Excel | .xlsx, .xls |
50 MB |
| Images | .png, .jpg, .jpeg |
50 MB |
Deduplication
If you upload a file with the same SHA-256 checksum as an existing document in your org, Hyphen returns the existing document_id instead of creating a duplicate. This is transparent ā you get back a 201 with the existing document, and the deduplicated: true flag indicates what happened.
Upload from URL
For files already hosted elsewhere:
curl -X POST https://apisvr.tryhyphen.com/documents/from-url \
-H "X-Org-Id: acme-corp" \
-H "Content-Type: application/json" \
-d '{
"url": "https://storage.example.com/reports/q4.csv",
"name": "Q4 Report",
"tags": ["finance"]
}'
Hyphen fetches the file, validates it, and stores it internally. The external URL is not accessed at workflow execution time.
Referencing Documents in Workflows
Use the doc: prefix anywhere you would normally pass inline data:
{
"type": "matcher",
"properties": {
"left": "doc:doc_a1b2c3d4e5f6",
"right": "doc:doc_x7y8z9w0v1u2",
"matchOn": ["invoice_id"],
"tolerance": 0.02,
"outputMatched": "matched",
"outputUnmatchedLeft": "unmatched_invoices"
}
}
At execution time, the engine resolves doc: references by streaming the file from managed storage and parsing it based on content type.
Resolution by Content Type
| Content Type | Resolves To | Example Use |
|---|---|---|
text/csv |
Array<Object> ā header row becomes keys |
Matcher left/right, Loop items_path |
application/json |
Object or Array as-is |
Any property expecting structured data |
text/plain |
String |
LLM template input, text analysis |
application/pdf |
Binary handle: { doc_id, content_type, storage_key, size_bytes } |
Agent tool input for document processing |
image/* |
Binary handle | Agent tool input for image analysis |
Excel (.xlsx) |
Array<Object> ā first sheet, header row becomes keys |
Same as CSV |
CSV parsing is strict. UTF-8 only, comma-delimited, first row as header, consistent column counts. Malformed files fail at upload time with diagnostic errors (e.g., "Row 47 has 6 columns, expected 5 based on header row"). Fix the file and re-upload.
{
"type": "loop",
"properties": {
"mode": "foreach",
"items_path": "doc:doc_customer_list",
"item_variable_name": "customer",
"actions_to_execute": [
{ "type": "send_welcome_email", "properties": { "email": "@customer.email" } }
]
}
}
The CSV resolves to an array, each row becomes @customer inside the loop.
Versioning
Upload a new version of an existing document without changing the document_id:
curl -X POST https://apisvr.tryhyphen.com/documents/doc_a1b2c3d4e5f6/versions \
-H "X-Org-Id: acme-corp" \
-F "file=@transactions_q4_updated.csv" \
-F 'change_note=Updated with December corrections'
The current_version increments automatically. Workflows referencing doc:doc_a1b2c3d4e5f6 always get the latest version.
Pinning to a Version
For audit reproducibility, pin a specific version:
{
"left": "doc:doc_a1b2c3d4e5f6@2"
}
The @N suffix locks to version N. This is important for regulated workflows where you need to prove exactly which dataset was used in a given run.
Reading Version History
Use the document audit endpoint for version history:
curl https://apisvr.tryhyphen.com/documents/doc_a1b2c3d4e5f6/audit \
-H "X-Org-Id: acme-corp"
The audit stream includes version_created entries with version metadata.
Webhook Triggers
Register webhooks to trigger your own automation service when documents are uploaded:
{
"event": "document.uploaded",
"url": "https://automation.example.com/hyphen/hooks/document-uploaded",
"filter": {
"tags": ["invoices"],
"content_type": "text/csv"
},
"auto_payload": {
"workflow_id": "wfl-123e4567-e89b-12d3-a456-426614174000"
}
}
Your automation service can then call POST /workflows/:id/execute with:
{
"input": {
"invoices": "doc:<document_id>"
}
}
This enables the upload-to-execution pattern: data providers drop files, workflows run automatically. See Webhooks for full configuration.
Audit Trail
Every document action is tracked:
| Event | When |
|---|---|
uploaded |
New document created |
version_created |
New version uploaded |
downloaded |
File content retrieved |
metadata_updated |
Metadata changed |
deleted |
Document soft-deleted |
curl https://apisvr.tryhyphen.com/documents/doc_a1b2c3d4e5f6/audit?limit=50 \
-H "X-Org-Id: acme-corp"
The document.referenced event links the document to a specific run_id, creating a bidirectional audit trail: from the document, you can see which workflows used it; from the workflow run, you can see which documents it consumed.
Storage Limits
| Limit | Default |
|---|---|
| Storage per organization | 10 GB |
| Documents per organization | 10,000 |
| Maximum file size | 50 MB |
| Maximum versions per document | 100 |
Check current usage:
curl https://apisvr.tryhyphen.com/documents/storage-usage \
-H "X-Org-Id: acme-corp"
Common Patterns
Reconciliation with uploaded datasets. Upload invoices and payments as CSVs ā reference both in a matcher step ā agent investigates exceptions ā results logged to custom table.
Agent document processing. Upload a PDF ā agent receives a binary handle ā agent uses an LLM action to extract text ā agent reasons about the content ā routes to appropriate workflow.
Scheduled reconciliation with versioned data. Upload new data monthly ā version the same document ā scheduled workflow always runs against the latest version. Pin previous versions for historical comparisons.
Webhook-driven ingestion. External system drops a CSV ā document.uploaded webhook fires ā agent classifies the document ā triggers the correct processing workflow. This is the Agent as Trigger pattern with document storage.
Document storage is infrastructure, not a primitive. It enhances existing primitives ā matcher can consume doc: references for its datasets, loop can iterate over uploaded CSVs, agents can process uploaded PDFs. Documents plug into the same @path context resolution system used everywhere else.