Matcher
The matcher takes two datasets and finds records that correspond to each other. It outputs three arrays: matched pairs, unmatched records from the left side, and unmatched records from the right side.
Basic Usage
{
"type": "matcher",
"properties": {
"left": "@input.invoices",
"right": "@input.payments",
"matchOn": ["invoice_id"],
"outputMatched": "matched",
"outputUnmatchedLeft": "unmatched_invoices",
"outputUnmatchedRight": "unmatched_payments"
}
}
This matches invoices to payments by exact invoice_id. Records with matching IDs land in @matched. Invoices with no payment go to @unmatched_invoices. Payments with no invoice go to @unmatched_payments.
Properties Reference
| Property | Type | Required | Description |
|---|---|---|---|
left |
array / @path | Yes | First dataset |
right |
array / @path | Yes | Second dataset |
matchOn |
string[] | Yes | Fields that must match exactly |
tolerance |
number | No | Numeric tolerance as a decimal (0.02 = 2%). Applied to numeric fields not in matchOn |
dateWindowDays |
number | No | Date tolerance in days (±N). Applied to date fields |
fuzzyThreshold |
number | No | Text similarity threshold 0ā100. Applied to the field specified by descriptionKey |
descriptionKey |
string | No | Field name for fuzzy text matching |
rules |
array | No | Custom matching rules evaluated via conditionEvaluator |
outputMatched |
string | No | Context key for matched pairs (default: "matched") |
outputUnmatchedLeft |
string | No | Context key for unmatched left records (default: "unmatchedLeft") |
outputUnmatchedRight |
string | No | Context key for unmatched right records (default: "unmatchedRight") |
Matching Criteria
Exact Key Matching (matchOn)
Fields listed in matchOn must match exactly. This is the primary matching criteria ā records are only compared if their matchOn fields align.
{
"matchOn": ["invoice_id"]
}
Multiple keys create a composite match ā all must match:
{
"matchOn": ["vendor_id", "invoice_number"]
}
Numeric Tolerance (tolerance)
For numeric fields (amounts, quantities), allow a percentage deviation. A tolerance of 0.02 means a 2% difference is still considered a match.
{
"matchOn": ["invoice_id"],
"tolerance": 0.02
}
With this configuration, an invoice for $1,000 would match a payment of $980ā$1,020.
Date Window (dateWindowDays)
Allow date fields to differ by up to N days:
{
"matchOn": ["invoice_id"],
"dateWindowDays": 3
}
An invoice dated January 10 would match a payment dated January 7ā13.
Fuzzy Text Matching (fuzzyThreshold + descriptionKey)
Compare text fields using the fuzzball similarity algorithm. The threshold is 0ā100 where 100 is an exact match:
{
"matchOn": ["vendor_id"],
"fuzzyThreshold": 85,
"descriptionKey": "description"
}
This matches records where vendor_id is identical and the description fields are at least 85% similar. Useful for matching line-item descriptions that may be worded differently across systems.
Custom Rules (rules)
Define additional matching rules evaluated by the condition engine:
{
"matchOn": ["invoice_id"],
"rules": [
{
"condition": {
"lessOrEqual": [
{ "abs": { "subtract": ["@left.amount", "@right.amount"] } },
50
]
}
}
]
}
Custom rules use the same condition operators as workflow conditions, with @left and @right referencing the current pair being compared.
Output Format
Matched Pairs
Each matched record contains both the left and right record:
[
{
"a": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme" },
"b": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme Corp" },
"match_score": 0.95,
"amount_difference": 0
}
]
The a field is the left record, b is the right record. match_score reflects overall match quality. amount_difference shows numeric deviation when tolerance matching is used.
Unmatched Records
Unmatched arrays contain the original records with no modifications:
[
{ "invoice_id": "INV-099", "amount": 5000, "vendor": "NewVendor" }
]
Worked Example
Input:
{
"invoices": [
{ "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-10", "description": "Monthly service fee" },
{ "invoice_id": "INV-002", "amount": 2500.00, "date": "2025-01-15", "description": "Equipment rental" },
{ "invoice_id": "INV-003", "amount": 750.00, "date": "2025-01-20", "description": "Consulting hours" }
],
"payments": [
{ "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-12", "description": "Monthly service" },
{ "invoice_id": "INV-002", "amount": 2475.00, "date": "2025-01-15", "description": "Equip rental Jan" }
]
}
Matcher configuration:
{
"type": "matcher",
"properties": {
"left": "@input.invoices",
"right": "@input.payments",
"matchOn": ["invoice_id"],
"tolerance": 0.02,
"dateWindowDays": 3,
"fuzzyThreshold": 80,
"descriptionKey": "description",
"outputMatched": "reconciled",
"outputUnmatchedLeft": "exceptions"
}
}
Results:
@reconciled: INV-001 (exact match), INV-002 (amount within 2% tolerance, descriptions 80%+ similar)@exceptions: INV-003 (no matching payment found)
Redis Optimization
For large datasets (10,000+ records per side), the matcher automatically uses Redis for indexing when available. This provides significant performance improvements by pre-indexing records by their matchOn keys rather than performing pairwise comparison.
No configuration change is needed ā the matcher detects Redis availability and dataset size automatically.
Matcher as the foundation. Most Hyphen workflows start with a matcher step. The matched records flow into deterministic processing, while exceptions route to AI agents or human review. This is the graduated exception handling pattern: deterministic rules for clear cases, AI for ambiguous cases, humans for edge cases.