Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt

Use this file to discover all available pages before exploring further.

Data flows into Claro through inbound connectors. For catalogues, inbound is configured per catalogue on the Data Source tab. For Research Agents, inbound is configured per agent run. This page covers every supported source, when to use it, and the limits and trade-offs to know about.

Inbound for Catalogues

Each catalogue’s Data Source tab is where you connect feeds. Each source has its own attribute mapping, schedule, and conflict policy.

File upload (CSV, XLSX)

  • Use for — initial loads, supplier files, periodic dumps from systems without an API.
  • Mapping — column-to-attribute mapping is saved on first upload and reused on subsequent uploads of the same shape.
  • Limits — large files are split into chunks server-side. For millions of rows, use a database connector or S3 instead.

Supplier Portal

  • Use for — suppliers without API access who need a self-serve way to send updates.
  • Behavior — submissions land as Data Sources with the supplier’s identity attached. Pre-mapped if the supplier has uploaded before.
  • Detail — see Onboard → Supplier Portal.

Scheduled scrape

  • Use for — public catalog pages, marketplace listings, competitor sources.
  • Configuration — URL templates, target schema, cadence, throttling.
  • Outputs — scrape runs land as a Data Source and feed any chained operations.

HTTPS pull

  • Use for — partner APIs, internal systems with REST endpoints.
  • Configuration — endpoint, auth, request schedule, response parsing (JSON / CSV).
  • Limits — auth modes supported: bearer, basic, OAuth, signed requests.

Database connectors

SourceModes
BigQueryRead tables or query results on a schedule.
PostgresRead tables or query results; per-row CDC where available.
SupabaseRead tables and views via the managed REST API.

Cloud storage

SourceModes
S3Pull files matching a prefix on a schedule.
Google DriveWatch a folder for new files.

Email-as-source

  • Use for — suppliers who only send updates by email.
  • Behavior — emails sent to a workspace address are parsed; attachments become Data Source uploads, body content can populate a target schema.

Inbound for Research Agents

Research Agents accept inputs per agent type. See Research Agents for details.
AgentInputs
Find your perfect listNatural-language brief and seed criteria.
Turn documents into structured dataPDFs, scanned docs, datasheets, target schema.
Analyze & enrich spreadsheetsCSV / XLSX file, enrichment goal.
Scrape data from URLsList of URLs or base URL plus crawl rules.
Outputs land in Generated Datasets and can be promoted into a Catalogue at any time.

Mapping and conflict resolution

For every inbound source, you configure how it interacts with existing records.

Mapping

Map source columns to catalogue attributes. Mappings include:
  • Type coercion — convert strings to numbers, normalize dates and currencies.
  • Computed fields — derive an attribute from one or more source columns.
  • Constants — fill an attribute with a fixed value (e.g. source = supplier_x).
  • Lookups — translate enum-like source values to your canonical enum.

Conflict policy

When an inbound row matches an existing record:
  • Overwrite — replace existing values (default for trusted sources).
  • Append — for multi-value attributes only.
  • Write-if-empty — only fill blanks, never replace.
  • Custom rule — most-recent, highest-confidence, or attribute-specific policy.

Identity matching

Every inbound row needs to resolve to a record. Data Source Mapping runs first, using the similarity graph and configured key fields. Unmatched rows are flagged for review before they land in the catalogue.

Limits and best practices

  • Test with a small sample first — 50 to 200 rows — before connecting a recurring source.
  • Validate the schema upstream when possible (Supplier Portal does this for you).
  • Prefer column-level mappings to ad-hoc fixes; mappings are reused, ad-hoc fixes are not.
  • For very high volume, prefer database or S3 connectors over file upload.

Outbound

Distributing data downstream is handled by Distribute — Unified Catalog and Sync & Export.