All operations require authentication using Bearer tokens. Make sure you have
your API credentials ready.
File and Format Requirements
- Supported formats: CSV, TSV, JSON, JSONL, PDF, and raw text files
- Column names:
- For CSV/TSV: Files must include a header row specifying column names
- For JSON/JSONL: Each object must use consistent field names
- For PDF: Text content will be extracted and organized into structured tables
- Content: Any raw data that needs cleaning, extraction, or organization
- Size limits:
- Minimum: 10 rows for meaningful processing
- Maximum: 1 million rows per data source (free tier: 10,000 rows)
- File size: Up to 100MB per file, 1GB total per data source
Best Practices
Data Quality
When uploading data that contains missing values, use proper formatting:- JSON/JSONL: Use the
nullkeyword (without quotes) for missing values - CSV/TSV: Leave cells empty to represent NULL values
- Mixed data types: Ensure numeric columns contain only numbers and properly formatted NULL values for optimal AI processing
Date and Time Formatting
For consistent handling of temporal information, use ISO 8601 format:- Dates:
YYYY-MM-DD(e.g.,2024-03-14) - Timestamps:
YYYY-MM-DDThh:mm:ssZ(e.g.,2024-03-14T15:30:00Z)
Include Rich Metadata
When preparing your dataset, include all relevant metadata fields since Claro can efficiently handle datasets with many columns. These additional columns provide valuable context for AI processing and enable richer analysis. Consider including:- Timestamps and version information
- Categories and classifications
- Source information and identifiers
- Status fields and flags
- Ratings and quality scores
- Any other contextual fields
Flatten Nested Data
If your data contains nested structures or objects (like JSON objects), flatten them into separate columns before uploading to Claro. For example, instead of having a single column containing{"status": "active", "priority": 3}, split it into two separate columns: status with value “active” and priority with value 3. This flat structure allows for better AI processing and analysis.
Upload Methods
API Upload
Use the Claro API to upload raw data sources programmatically:Platform Dashboard Upload
- Visit Platform Dashboard: Go to https://app.prod.getclaro.ai/dashboard/ and sign in
- Upload Data Sources: Drag and drop your files or use the file picker to upload CSV, PDF, or other supported formats
- Review Processing: Claro automatically extracts, cleans, and organizes your data into structured tables
Data Cleaning Pipeline
Once uploaded, Claro automatically:- Extracts content from PDFs and organizes into structured tables
- Cleans CSV data, handles missing values, and standardizes formats
- Validates file structure and detects data patterns
- Organizes unstructured data into consistent table formats
- Indexes cleaned data for fast retrieval and selection
Next Steps
After uploading your data sources:- Review Cleaned Data: Check the organized tables in your dashboard
- Create Dataset: Select from your cleaned data sources and choose a dataset type/template
- Configure Dataset: Set up the dataset for specific use cases (product catalog, supplier data, etc.)
- Start AI Workflows: Begin classification, enrichment, or extraction tasks on your structured dataset