Claro maintains a similarity graph across records using a combination of embedding similarity and rule-based matching. Duplicate detection, supplier match, variant grouping, and entity resolution all read from this graph.Documentation Index
Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt
Use this file to discover all available pages before exploring further.
What the graph contains
Every record in a catalogue is embedded across the attributes you select. The graph stores:- Pairwise similarity scores — between records in the same catalogue, and across catalogues you’ve connected.
- Cluster memberships — high-confidence groups of records likely to be the same entity or close variants.
- Match decisions — every approve/reject decision is remembered and used as positive or negative training signal for future matching.
The Similarity & Duplicate surface
The page is a single workspace for inspecting and acting on the graph.- Cluster browser — proposed clusters of likely duplicates or variants, sortable by size, average similarity, and confidence.
- Pair view — for any two records, a side-by-side diff with attribute-level similarity scores.
- Threshold controls — set the score above which proposals auto-merge, and the score below which they’re discarded.
- Field weights — configure which attributes drive similarity (e.g. weight
gtinandmpnhigher thandescription). - Reasoning — for every proposed merge, the contributing fields and their individual scores.
How operations use the graph
- Find Duplicates — surfaces clusters of likely duplicate records and produces merge proposals you can approve in bulk.
- Find Similarities — surfaces broader similarity clusters useful for variant grouping, supplier match, and cross-catalogue linking.
- Data Source Mapping — when an uploaded file lands as a Data Source, the graph is queried to match each row to existing records before merge.
- Bulk Enrichment can pull values from similar records when filling gaps.
Merging and rejecting
When you approve a merge proposal:- The records are combined into a single record.
- Conflicting attribute values resolve via configured rules (most-recent, highest-confidence, source priority, manual choice).
- All references and history from both records are preserved on the survivor.
- The merge is reversible from the record’s history.
- The pair is added to a negative-examples set used to suppress similar proposals in the future.
- This is workspace-scoped and improves match precision over time.
Tuning matching
You’ll typically tune three things per catalogue:- Field weights — which attributes matter most for identity.
gtinshould dominatedescription. - Auto-merge threshold — high enough that auto-applied merges are virtually always correct.
- Review threshold — the floor for proposals that queue in Notifications. Below this, proposals are discarded.