Documentation Index
Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt
Use this file to discover all available pages before exploring further.
Data flows into Claro through inbound connectors. For catalogues, inbound is configured per catalogue on the Data Source tab. For Research Agents, inbound is configured per agent run.
This page covers every supported source, when to use it, and the limits and trade-offs to know about.
Inbound for Catalogues
Each catalogue’s Data Source tab is where you connect feeds. Each source has its own attribute mapping, schedule, and conflict policy.
File upload (CSV, XLSX)
- Use for — initial loads, supplier files, periodic dumps from systems without an API.
- Mapping — column-to-attribute mapping is saved on first upload and reused on subsequent uploads of the same shape.
- Limits — large files are split into chunks server-side. For millions of rows, use a database connector or S3 instead.
Supplier Portal
- Use for — suppliers without API access who need a self-serve way to send updates.
- Behavior — submissions land as Data Sources with the supplier’s identity attached. Pre-mapped if the supplier has uploaded before.
- Detail — see Onboard → Supplier Portal.
Scheduled scrape
- Use for — public catalog pages, marketplace listings, competitor sources.
- Configuration — URL templates, target schema, cadence, throttling.
- Outputs — scrape runs land as a Data Source and feed any chained operations.
HTTPS pull
- Use for — partner APIs, internal systems with REST endpoints.
- Configuration — endpoint, auth, request schedule, response parsing (JSON / CSV).
- Limits — auth modes supported: bearer, basic, OAuth, signed requests.
Database connectors
| Source | Modes |
|---|
| BigQuery | Read tables or query results on a schedule. |
| Postgres | Read tables or query results; per-row CDC where available. |
| Supabase | Read tables and views via the managed REST API. |
Cloud storage
| Source | Modes |
|---|
| S3 | Pull files matching a prefix on a schedule. |
| Google Drive | Watch a folder for new files. |
Email-as-source
- Use for — suppliers who only send updates by email.
- Behavior — emails sent to a workspace address are parsed; attachments become Data Source uploads, body content can populate a target schema.
Inbound for Research Agents
Research Agents accept inputs per agent type. See Research Agents for details.
| Agent | Inputs |
|---|
| Find your perfect list | Natural-language brief and seed criteria. |
| Turn documents into structured data | PDFs, scanned docs, datasheets, target schema. |
| Analyze & enrich spreadsheets | CSV / XLSX file, enrichment goal. |
| Scrape data from URLs | List of URLs or base URL plus crawl rules. |
Outputs land in Generated Datasets and can be promoted into a Catalogue at any time.
Mapping and conflict resolution
For every inbound source, you configure how it interacts with existing records.
Mapping
Map source columns to catalogue attributes. Mappings include:
- Type coercion — convert strings to numbers, normalize dates and currencies.
- Computed fields — derive an attribute from one or more source columns.
- Constants — fill an attribute with a fixed value (e.g. source = supplier_x).
- Lookups — translate enum-like source values to your canonical enum.
Conflict policy
When an inbound row matches an existing record:
- Overwrite — replace existing values (default for trusted sources).
- Append — for multi-value attributes only.
- Write-if-empty — only fill blanks, never replace.
- Custom rule — most-recent, highest-confidence, or attribute-specific policy.
Identity matching
Every inbound row needs to resolve to a record. Data Source Mapping runs first, using the similarity graph and configured key fields. Unmatched rows are flagged for review before they land in the catalogue.
Limits and best practices
- Test with a small sample first — 50 to 200 rows — before connecting a recurring source.
- Validate the schema upstream when possible (Supplier Portal does this for you).
- Prefer column-level mappings to ad-hoc fixes; mappings are reused, ad-hoc fixes are not.
- For very high volume, prefer database or S3 connectors over file upload.
Outbound
Distributing data downstream is handled by Distribute — Unified Catalog and Sync & Export.