Matcher
The matcher takes two datasets and finds records that correspond to each other. It outputs three arrays: matched pairs, unmatched records from the left side, and unmatched records from the right side.
Basic Usage
{
"type": "matcher",
"properties": {
"left": "@input.invoices",
"right": "@input.payments",
"matchOn": ["invoice_id"],
"outputMatched": "matched",
"outputUnmatchedLeft": "unmatched_invoices",
"outputUnmatchedRight": "unmatched_payments"
}
}
This matches invoices to payments by exact invoice_id. Records with matching IDs land in @matched. Invoices with no payment go to @unmatched_invoices. Payments with no invoice go to @unmatched_payments.
Properties Reference
| Property | Type | Required | Description |
|---|---|---|---|
left |
array / @path / doc: | Yes | First dataset — inline array, @path reference, or doc: uploaded document |
right |
array / @path / doc: | Yes | Second dataset — inline array, @path reference, or doc: uploaded document |
matchOn |
string[] | Yes | Fields that must match exactly |
tolerance |
number | No | Absolute numeric tolerance applied to the amount field. A tolerance of 0.02 means amounts differing by more than $0.02 won't match |
dateWindowDays |
number | No | Date tolerance in days (±N). Applied to the date field |
fuzzyThreshold |
number | No | Text similarity threshold 0–100. Applied to the field specified by descriptionKey |
descriptionKey |
string | No | Field name for fuzzy text matching |
rules |
array | No | Custom matching rules |
outputMatched |
string | No | Context key for matched pairs (default: "matched") |
outputUnmatchedLeft |
string | No | Context key for unmatched left records (default: "unmatchedLeft") |
outputUnmatchedRight |
string | No | Context key for unmatched right records (default: "unmatchedRight") |
Matching Criteria
Exact Key Matching (matchOn)
Fields listed in matchOn must match exactly. This is the primary matching criteria — records are only compared if their matchOn fields align.
{
"matchOn": ["invoice_id"]
}
Multiple keys create a composite match — all must match:
{
"matchOn": ["vendor_id", "invoice_number"]
}
Numeric Tolerance (tolerance)
Allow the amount field to differ by an absolute value. A tolerance of 0.02 means amounts within $0.02 of each other are still considered a match.
{
"matchOn": ["invoice_id"],
"tolerance": 0.02
}
With this configuration, an invoice for $1,000.00 would match a payment of $999.98–$1,000.02. For larger absolute tolerances, use values like 50 to allow a $50 difference.
Date Window (dateWindowDays)
Allow date fields to differ by up to N days:
{
"matchOn": ["invoice_id"],
"dateWindowDays": 3
}
An invoice dated January 10 would match a payment dated January 7–13.
Fuzzy Text Matching (fuzzyThreshold + descriptionKey)
Compare text fields using fuzzy string similarity. The threshold is 0–100 where 100 is an exact match:
{
"matchOn": ["vendor_id"],
"fuzzyThreshold": 85,
"descriptionKey": "description"
}
This matches records where vendor_id is identical and the description fields are at least 85% similar. Useful for matching line-item descriptions that may be worded differently across systems.
Custom Rules (rules)
Define additional matching rules evaluated by the condition engine:
{
"matchOn": ["invoice_id"],
"rules": [
{
"condition": {
"lessOrEqual": [
{ "abs": { "subtract": ["@left.amount", "@right.amount"] } },
50
]
}
}
]
}
Custom rules use the same condition operators as workflow conditions, with @left and @right referencing the current pair being compared.
Output Format
Matched Pairs
Each matched record contains both the left and right record:
[
{
"a": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme" },
"b": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme Corp" },
"match_score": 0.95,
"amount_difference": 0
}
]
The a field is the left record, b is the right record. match_score reflects overall match quality. amount_difference shows numeric deviation when tolerance matching is used.
Unmatched Records
Unmatched arrays contain the original records with no modifications:
[
{ "invoice_id": "INV-099", "amount": 5000, "vendor": "NewVendor" }
]
Worked Example
Input:
{
"invoices": [
{ "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-10", "description": "Monthly service fee" },
{ "invoice_id": "INV-002", "amount": 2500.00, "date": "2025-01-15", "description": "Equipment rental" },
{ "invoice_id": "INV-003", "amount": 750.00, "date": "2025-01-20", "description": "Consulting hours" }
],
"payments": [
{ "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-12", "description": "Monthly service" },
{ "invoice_id": "INV-002", "amount": 2475.00, "date": "2025-01-15", "description": "Equip rental Jan" }
]
}
Matcher configuration:
{
"type": "matcher",
"properties": {
"left": "@input.invoices",
"right": "@input.payments",
"matchOn": ["invoice_id"],
"tolerance": 50,
"dateWindowDays": 3,
"fuzzyThreshold": 80,
"descriptionKey": "description",
"outputMatched": "reconciled",
"outputUnmatchedLeft": "exceptions"
}
}
Results:
@reconciled: INV-001 (exact match), INV-002 (amount difference $25 within tolerance, descriptions 80%+ similar)@exceptions: INV-003 (no matching payment found)
Using Uploaded Documents
Instead of embedding datasets in the execution payload, upload files to Document Storage and reference them with the doc: prefix:
{
"type": "matcher",
"properties": {
"left": "doc:doc_a1b2c3d4e5f6",
"right": "doc:doc_x7y8z9w0v1u2",
"matchOn": ["invoice_id"],
"tolerance": 50,
"outputMatched": "matched",
"outputUnmatchedLeft": "exceptions"
}
}
CSV files resolve to Array<Object> with header rows as keys. JSON files resolve as-is. Pin a specific version with doc:doc_xxx@2 for audit reproducibility.
Large Dataset Optimization
For large datasets (10,000+ records per side), the matcher automatically switches to an indexed matching strategy when the infrastructure supports it. This provides significant performance improvements by pre-indexing records by their matchOn keys rather than performing pairwise comparison.
No configuration change is needed — the matcher detects the optimal strategy based on dataset size automatically.
Matcher as the foundation. Most Hyphen workflows start with a matcher step. The matched records flow into deterministic processing, while exceptions route to AI agents or human review. This is the graduated exception handling pattern: deterministic rules for clear cases, AI for ambiguous cases, humans for edge cases.