Matcher

The matcher takes two datasets and finds records that correspond to each other. It outputs three arrays: matched pairs, unmatched records from the left side, and unmatched records from the right side.

flowchart LR L["Left Dataset<br/>(e.g. invoices)"] --> M{"Matcher"} R["Right Dataset<br/>(e.g. payments)"] --> M M --> Matched["Matched Pairs"] M --> UL["Unmatched Left"] M --> UR["Unmatched Right"]

Basic Usage

json
{
  "type": "matcher",
  "properties": {
    "left": "@input.invoices",
    "right": "@input.payments",
    "matchOn": ["invoice_id"],
    "outputMatched": "matched",
    "outputUnmatchedLeft": "unmatched_invoices",
    "outputUnmatchedRight": "unmatched_payments"
  }
}

This matches invoices to payments by exact invoice_id. Records with matching IDs land in @matched. Invoices with no payment go to @unmatched_invoices. Payments with no invoice go to @unmatched_payments.


Properties Reference

Property Type Required Description
left array / @path / doc: Yes First dataset — inline array, @path reference, or doc: uploaded document
right array / @path / doc: Yes Second dataset — inline array, @path reference, or doc: uploaded document
matchOn string[] Yes Fields that must match exactly
tolerance number No Absolute numeric tolerance applied to the amount field. A tolerance of 0.02 means amounts differing by more than $0.02 won't match
dateWindowDays number No Date tolerance in days (±N). Applied to the date field
fuzzyThreshold number No Text similarity threshold 0–100. Applied to the field specified by descriptionKey
descriptionKey string No Field name for fuzzy text matching
rules array No Custom matching rules
outputMatched string No Context key for matched pairs (default: "matched")
outputUnmatchedLeft string No Context key for unmatched left records (default: "unmatchedLeft")
outputUnmatchedRight string No Context key for unmatched right records (default: "unmatchedRight")

Matching Criteria

Exact Key Matching (matchOn)

Fields listed in matchOn must match exactly. This is the primary matching criteria — records are only compared if their matchOn fields align.

json
{
  "matchOn": ["invoice_id"]
}

Multiple keys create a composite match — all must match:

json
{
  "matchOn": ["vendor_id", "invoice_number"]
}

Numeric Tolerance (tolerance)

Allow the amount field to differ by an absolute value. A tolerance of 0.02 means amounts within $0.02 of each other are still considered a match.

json
{
  "matchOn": ["invoice_id"],
  "tolerance": 0.02
}

With this configuration, an invoice for $1,000.00 would match a payment of $999.98–$1,000.02. For larger absolute tolerances, use values like 50 to allow a $50 difference.

Date Window (dateWindowDays)

Allow date fields to differ by up to N days:

json
{
  "matchOn": ["invoice_id"],
  "dateWindowDays": 3
}

An invoice dated January 10 would match a payment dated January 7–13.

Fuzzy Text Matching (fuzzyThreshold + descriptionKey)

Compare text fields using fuzzy string similarity. The threshold is 0–100 where 100 is an exact match:

json
{
  "matchOn": ["vendor_id"],
  "fuzzyThreshold": 85,
  "descriptionKey": "description"
}

This matches records where vendor_id is identical and the description fields are at least 85% similar. Useful for matching line-item descriptions that may be worded differently across systems.

Custom Rules (rules)

Define additional matching rules evaluated by the condition engine:

json
{
  "matchOn": ["invoice_id"],
  "rules": [
    {
      "condition": {
        "lessOrEqual": [
          { "abs": { "subtract": ["@left.amount", "@right.amount"] } },
          50
        ]
      }
    }
  ]
}

Custom rules use the same condition operators as workflow conditions, with @left and @right referencing the current pair being compared.


Output Format

Matched Pairs

Each matched record contains both the left and right record:

json
[
  {
    "a": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme" },
    "b": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme Corp" },
    "match_score": 0.95,
    "amount_difference": 0
  }
]

The a field is the left record, b is the right record. match_score reflects overall match quality. amount_difference shows numeric deviation when tolerance matching is used.

Unmatched Records

Unmatched arrays contain the original records with no modifications:

json
[
  { "invoice_id": "INV-099", "amount": 5000, "vendor": "NewVendor" }
]

Worked Example

Input:

json
{
  "invoices": [
    { "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-10", "description": "Monthly service fee" },
    { "invoice_id": "INV-002", "amount": 2500.00, "date": "2025-01-15", "description": "Equipment rental" },
    { "invoice_id": "INV-003", "amount": 750.00, "date": "2025-01-20", "description": "Consulting hours" }
  ],
  "payments": [
    { "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-12", "description": "Monthly service" },
    { "invoice_id": "INV-002", "amount": 2475.00, "date": "2025-01-15", "description": "Equip rental Jan" }
  ]
}

Matcher configuration:

json
{
  "type": "matcher",
  "properties": {
    "left": "@input.invoices",
    "right": "@input.payments",
    "matchOn": ["invoice_id"],
    "tolerance": 50,
    "dateWindowDays": 3,
    "fuzzyThreshold": 80,
    "descriptionKey": "description",
    "outputMatched": "reconciled",
    "outputUnmatchedLeft": "exceptions"
  }
}

Results:

  • @reconciled: INV-001 (exact match), INV-002 (amount difference $25 within tolerance, descriptions 80%+ similar)
  • @exceptions: INV-003 (no matching payment found)

Using Uploaded Documents

Instead of embedding datasets in the execution payload, upload files to Document Storage and reference them with the doc: prefix:

json
{
  "type": "matcher",
  "properties": {
    "left": "doc:doc_a1b2c3d4e5f6",
    "right": "doc:doc_x7y8z9w0v1u2",
    "matchOn": ["invoice_id"],
    "tolerance": 50,
    "outputMatched": "matched",
    "outputUnmatchedLeft": "exceptions"
  }
}

CSV files resolve to Array<Object> with header rows as keys. JSON files resolve as-is. Pin a specific version with doc:doc_xxx@2 for audit reproducibility.


Large Dataset Optimization

For large datasets (10,000+ records per side), the matcher automatically switches to an indexed matching strategy when the infrastructure supports it. This provides significant performance improvements by pre-indexing records by their matchOn keys rather than performing pairwise comparison.

No configuration change is needed — the matcher detects the optimal strategy based on dataset size automatically.

Matcher as the foundation. Most Hyphen workflows start with a matcher step. The matched records flow into deterministic processing, while exceptions route to AI agents or human review. This is the graduated exception handling pattern: deterministic rules for clear cases, AI for ambiguous cases, humans for edge cases.

→ Next: [Loop](/primitives/loop)