Skip to main content
All operations require authentication using Bearer tokens. Make sure you have your API credentials ready.

File and Format Requirements

  • Supported formats: CSV, TSV, JSON, JSONL, PDF, and raw text files
  • Column names:
    • For CSV/TSV: Files must include a header row specifying column names
    • For JSON/JSONL: Each object must use consistent field names
    • For PDF: Text content will be extracted and organized into structured tables
  • Content: Any raw data that needs cleaning, extraction, or organization
  • Size limits:
    • Minimum: 10 rows for meaningful processing
    • Maximum: 1 million rows per data source (free tier: 10,000 rows)
    • File size: Up to 100MB per file, 1GB total per data source

Best Practices

Data Quality

When uploading data that contains missing values, use proper formatting:
  • JSON/JSONL: Use the null keyword (without quotes) for missing values
  • CSV/TSV: Leave cells empty to represent NULL values
  • Mixed data types: Ensure numeric columns contain only numbers and properly formatted NULL values for optimal AI processing

Date and Time Formatting

For consistent handling of temporal information, use ISO 8601 format:
  • Dates: YYYY-MM-DD (e.g., 2024-03-14)
  • Timestamps: YYYY-MM-DDThh:mm:ssZ (e.g., 2024-03-14T15:30:00Z)
Our data parser will attempt to detect other date formats, but using the ISO standard ensures the most reliable processing and transformation.

Include Rich Metadata

When preparing your dataset, include all relevant metadata fields since Claro can efficiently handle datasets with many columns. These additional columns provide valuable context for AI processing and enable richer analysis. Consider including:
  • Timestamps and version information
  • Categories and classifications
  • Source information and identifiers
  • Status fields and flags
  • Ratings and quality scores
  • Any other contextual fields

Flatten Nested Data

If your data contains nested structures or objects (like JSON objects), flatten them into separate columns before uploading to Claro. For example, instead of having a single column containing {"status": "active", "priority": 3}, split it into two separate columns: status with value “active” and priority with value 3. This flat structure allows for better AI processing and analysis.

Upload Methods

API Upload

Use the Claro API to upload raw data sources programmatically:
curl -X POST https://secure-api.getclaro.ai/api/v2/datasources/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "files[]=@your-data.csv"
{
  "message": "Data source uploaded successfully",
  "datasourceId": "550e8400-e29b-41d4-a716-446655440000",
  "fileName": "products.csv",
  "contentType": "text/csv",
  "fileSize": 245760,
  "status": "queued",
  "estimatedProcessingTime": "2-5 minutes"
}

Platform Dashboard Upload

  1. Visit Platform Dashboard: Go to https://app.prod.getclaro.ai/dashboard/ and sign in
  2. Upload Data Sources: Drag and drop your files or use the file picker to upload CSV, PDF, or other supported formats
  3. Review Processing: Claro automatically extracts, cleans, and organizes your data into structured tables

Data Cleaning Pipeline

Once uploaded, Claro automatically:
  1. Extracts content from PDFs and organizes into structured tables
  2. Cleans CSV data, handles missing values, and standardizes formats
  3. Validates file structure and detects data patterns
  4. Organizes unstructured data into consistent table formats
  5. Indexes cleaned data for fast retrieval and selection

Next Steps

After uploading your data sources:
  1. Review Cleaned Data: Check the organized tables in your dashboard
  2. Create Dataset: Select from your cleaned data sources and choose a dataset type/template
  3. Configure Dataset: Set up the dataset for specific use cases (product catalog, supplier data, etc.)
  4. Start AI Workflows: Begin classification, enrichment, or extraction tasks on your structured dataset