Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt

Use this file to discover all available pages before exploring further.

Claro maintains a similarity graph across records using a combination of embedding similarity and rule-based matching. Duplicate detection, supplier match, variant grouping, and entity resolution all read from this graph.

What the graph contains

Every record in a catalogue is embedded across the attributes you select. The graph stores:
  • Pairwise similarity scores — between records in the same catalogue, and across catalogues you’ve connected.
  • Cluster memberships — high-confidence groups of records likely to be the same entity or close variants.
  • Match decisions — every approve/reject decision is remembered and used as positive or negative training signal for future matching.
The graph is updated continuously as records change, are added, or are removed.

The Similarity & Duplicate surface

The page is a single workspace for inspecting and acting on the graph.
  • Cluster browser — proposed clusters of likely duplicates or variants, sortable by size, average similarity, and confidence.
  • Pair view — for any two records, a side-by-side diff with attribute-level similarity scores.
  • Threshold controls — set the score above which proposals auto-merge, and the score below which they’re discarded.
  • Field weights — configure which attributes drive similarity (e.g. weight gtin and mpn higher than description).
  • Reasoning — for every proposed merge, the contributing fields and their individual scores.

How operations use the graph

  • Find Duplicates — surfaces clusters of likely duplicate records and produces merge proposals you can approve in bulk.
  • Find Similarities — surfaces broader similarity clusters useful for variant grouping, supplier match, and cross-catalogue linking.
  • Data Source Mapping — when an uploaded file lands as a Data Source, the graph is queried to match each row to existing records before merge.
  • Bulk Enrichment can pull values from similar records when filling gaps.

Merging and rejecting

When you approve a merge proposal:
  • The records are combined into a single record.
  • Conflicting attribute values resolve via configured rules (most-recent, highest-confidence, source priority, manual choice).
  • All references and history from both records are preserved on the survivor.
  • The merge is reversible from the record’s history.
When you reject a pair:
  • The pair is added to a negative-examples set used to suppress similar proposals in the future.
  • This is workspace-scoped and improves match precision over time.

Tuning matching

You’ll typically tune three things per catalogue:
  1. Field weights — which attributes matter most for identity. gtin should dominate description.
  2. Auto-merge threshold — high enough that auto-applied merges are virtually always correct.
  3. Review threshold — the floor for proposals that queue in Notifications. Below this, proposals are discarded.
Defaults are conservative. Adjust them once you’ve reviewed a few hundred merges and have a sense of where the score distribution lands for your data.