Matcher

The matcher takes two datasets and finds records that correspond to each other. It outputs three arrays: matched pairs, unmatched records from the left side, and unmatched records from the right side.

flowchart LR L["Left Dataset<br/>(e.g. invoices)"] --> M{"Matcher"} R["Right Dataset<br/>(e.g. payments)"] --> M M --> Matched["Matched Pairs"] M --> UL["Unmatched Left"] M --> UR["Unmatched Right"]

Basic Usage

json
{
  "type": "matcher",
  "properties": {
    "left": "@input.invoices",
    "right": "@input.payments",
    "matchOn": ["invoice_id"],
    "outputMatched": "matched",
    "outputUnmatchedLeft": "unmatched_invoices",
    "outputUnmatchedRight": "unmatched_payments"
  }
}

This matches invoices to payments by exact invoice_id. Records with matching IDs land in @matched. Invoices with no payment go to @unmatched_invoices. Payments with no invoice go to @unmatched_payments.


Properties Reference

Property Type Required Description
left array / @path Yes First dataset
right array / @path Yes Second dataset
matchOn string[] Yes Fields that must match exactly
tolerance number No Numeric tolerance as a decimal (0.02 = 2%). Applied to numeric fields not in matchOn
dateWindowDays number No Date tolerance in days (±N). Applied to date fields
fuzzyThreshold number No Text similarity threshold 0–100. Applied to the field specified by descriptionKey
descriptionKey string No Field name for fuzzy text matching
rules array No Custom matching rules evaluated via conditionEvaluator
outputMatched string No Context key for matched pairs (default: "matched")
outputUnmatchedLeft string No Context key for unmatched left records (default: "unmatchedLeft")
outputUnmatchedRight string No Context key for unmatched right records (default: "unmatchedRight")

Matching Criteria

Exact Key Matching (matchOn)

Fields listed in matchOn must match exactly. This is the primary matching criteria — records are only compared if their matchOn fields align.

json
{
  "matchOn": ["invoice_id"]
}

Multiple keys create a composite match — all must match:

json
{
  "matchOn": ["vendor_id", "invoice_number"]
}

Numeric Tolerance (tolerance)

For numeric fields (amounts, quantities), allow a percentage deviation. A tolerance of 0.02 means a 2% difference is still considered a match.

json
{
  "matchOn": ["invoice_id"],
  "tolerance": 0.02
}

With this configuration, an invoice for $1,000 would match a payment of $980–$1,020.

Date Window (dateWindowDays)

Allow date fields to differ by up to N days:

json
{
  "matchOn": ["invoice_id"],
  "dateWindowDays": 3
}

An invoice dated January 10 would match a payment dated January 7–13.

Fuzzy Text Matching (fuzzyThreshold + descriptionKey)

Compare text fields using the fuzzball similarity algorithm. The threshold is 0–100 where 100 is an exact match:

json
{
  "matchOn": ["vendor_id"],
  "fuzzyThreshold": 85,
  "descriptionKey": "description"
}

This matches records where vendor_id is identical and the description fields are at least 85% similar. Useful for matching line-item descriptions that may be worded differently across systems.

Custom Rules (rules)

Define additional matching rules evaluated by the condition engine:

json
{
  "matchOn": ["invoice_id"],
  "rules": [
    {
      "condition": {
        "lessOrEqual": [
          { "abs": { "subtract": ["@left.amount", "@right.amount"] } },
          50
        ]
      }
    }
  ]
}

Custom rules use the same condition operators as workflow conditions, with @left and @right referencing the current pair being compared.


Output Format

Matched Pairs

Each matched record contains both the left and right record:

json
[
  {
    "a": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme" },
    "b": { "invoice_id": "INV-001", "amount": 1000, "vendor": "Acme Corp" },
    "match_score": 0.95,
    "amount_difference": 0
  }
]

The a field is the left record, b is the right record. match_score reflects overall match quality. amount_difference shows numeric deviation when tolerance matching is used.

Unmatched Records

Unmatched arrays contain the original records with no modifications:

json
[
  { "invoice_id": "INV-099", "amount": 5000, "vendor": "NewVendor" }
]

Worked Example

Input:

json
{
  "invoices": [
    { "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-10", "description": "Monthly service fee" },
    { "invoice_id": "INV-002", "amount": 2500.00, "date": "2025-01-15", "description": "Equipment rental" },
    { "invoice_id": "INV-003", "amount": 750.00, "date": "2025-01-20", "description": "Consulting hours" }
  ],
  "payments": [
    { "invoice_id": "INV-001", "amount": 1000.00, "date": "2025-01-12", "description": "Monthly service" },
    { "invoice_id": "INV-002", "amount": 2475.00, "date": "2025-01-15", "description": "Equip rental Jan" }
  ]
}

Matcher configuration:

json
{
  "type": "matcher",
  "properties": {
    "left": "@input.invoices",
    "right": "@input.payments",
    "matchOn": ["invoice_id"],
    "tolerance": 0.02,
    "dateWindowDays": 3,
    "fuzzyThreshold": 80,
    "descriptionKey": "description",
    "outputMatched": "reconciled",
    "outputUnmatchedLeft": "exceptions"
  }
}

Results:

  • @reconciled: INV-001 (exact match), INV-002 (amount within 2% tolerance, descriptions 80%+ similar)
  • @exceptions: INV-003 (no matching payment found)

Redis Optimization

For large datasets (10,000+ records per side), the matcher automatically uses Redis for indexing when available. This provides significant performance improvements by pre-indexing records by their matchOn keys rather than performing pairwise comparison.

No configuration change is needed — the matcher detects Redis availability and dataset size automatically.

Matcher as the foundation. Most Hyphen workflows start with a matcher step. The matched records flow into deterministic processing, while exceptions route to AI agents or human review. This is the graduated exception handling pattern: deterministic rules for clear cases, AI for ambiguous cases, humans for edge cases.

→ Next: [Loop](/primitives/loop)