Catalog & data
Q: What’s the difference between a Catalogue and a dataset? A: A Catalogue is persistent — it has a defined schema (Attributes), versioned records, data sources, and a full history with provenance. Operations write back to it on a schedule or in a pipeline. A dataset (from a Research Agent) is a one-off output; it exists until you promote it into a Catalogue. Q: Can I have multiple catalogues in one workspace? A: Yes. Catalogues can also reference each other — for example,Product.supplier → Supplier. Cross-catalogue references enable filtering, enrichment, and denormalization across related objects.
Q: How do I handle large catalogs (millions of records)?
A: Use database connectors (BigQuery, Postgres, Supabase) or S3 as Data Sources rather than file uploads. Operations run in parallel batches server-side. Contact us for guidance on sharding and scheduling for very large volumes.
Q: What file formats are supported for import?
A: CSV and XLSX for file upload. Structured JSON and CSV via HTTPS pull and database connectors. PDFs for Knowledge Bases (Word, PowerPoint, and plain text coming soon).