> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload Data Source

> Upload raw data files to Claro for cleaning and organization. Claro extracts content from PDFs, cleans CSV data, and organizes unstructured data into structured tables.

<Note>
  All operations require authentication using Bearer tokens. Make sure you have
  your API credentials ready.
</Note>

## File and Format Requirements

* **Supported formats**: CSV, TSV, JSON, JSONL, PDF, and raw text files
* **Column names**:
  * For CSV/TSV: Files must include a header row specifying column names
  * For JSON/JSONL: Each object must use consistent field names
  * For PDF: Text content will be extracted and organized into structured tables
* **Content**: Any raw data that needs cleaning, extraction, or organization
* **Size limits**:
  * Minimum: 10 rows for meaningful processing
  * Maximum: 1 million rows per data source (free tier: 10,000 rows)
  * File size: Up to 100MB per file, 1GB total per data source

## Best Practices

### Data Quality

When uploading data that contains missing values, use proper formatting:

* **JSON/JSONL**: Use the `null` keyword (without quotes) for missing values
* **CSV/TSV**: Leave cells empty to represent NULL values
* **Mixed data types**: Ensure numeric columns contain only numbers and properly formatted NULL values for optimal AI processing

### Date and Time Formatting

For consistent handling of temporal information, use ISO 8601 format:

* **Dates**: `YYYY-MM-DD` (e.g., `2024-03-14`)
* **Timestamps**: `YYYY-MM-DDThh:mm:ssZ` (e.g., `2024-03-14T15:30:00Z`)

Our data parser will attempt to detect other date formats, but using the ISO standard ensures the most reliable processing and transformation.

### Include Rich Metadata

When preparing your dataset, include all relevant metadata fields since Claro can efficiently handle datasets with many columns. These additional columns provide valuable context for AI processing and enable richer analysis. Consider including:

* Timestamps and version information
* Categories and classifications
* Source information and identifiers
* Status fields and flags
* Ratings and quality scores
* Any other contextual fields

### Flatten Nested Data

If your data contains nested structures or objects (like JSON objects), flatten them into separate columns before uploading to Claro. For example, instead of having a single column containing `{"status": "active", "priority": 3}`, split it into two separate columns: `status` with value "active" and `priority` with value 3. This flat structure allows for better AI processing and analysis.

## Upload Methods

### API Upload

Use the Claro API to upload raw data sources programmatically:

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://secure-api.getclaro.ai/api/v2/datasources/upload \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -F "files[]=@your-data.csv"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  files = {"files[]": open("your-data.csv", "rb")}

  response = requests.post(
      "https://secure-api.getclaro.ai/api/v2/datasources/upload",
      headers=headers,
      files=files
  )
  ```

  ```javascript JavaScript theme={null}
  const formData = new FormData();
  formData.append("files[]", fileInput.files[0]);

  const response = await fetch(
    "https://secure-api.getclaro.ai/api/v2/datasources/upload",
    {
      method: "POST",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
      body: formData,
    }
  );
  ```
</CodeGroup>

<CodeGroup>
  ```json Success Response theme={null}
  {
    "message": "Data source uploaded successfully",
    "datasourceId": "550e8400-e29b-41d4-a716-446655440000",
    "fileName": "products.csv",
    "contentType": "text/csv",
    "fileSize": 245760,
    "status": "queued",
    "estimatedProcessingTime": "2-5 minutes"
  }
  ```

  ```json Invalid Format theme={null}
  {
    "error": "Invalid file format",
    "code": "INVALID_FILE_FORMAT",
    "details": {
      "supported_formats": ["csv", "tsv", "json", "jsonl", "pdf"],
      "received_format": "xlsx"
    }
  }
  ```

  ```json File Too Large theme={null}
  {
    "error": "File size exceeds limit",
    "code": "FILE_TOO_LARGE",
    "details": {
      "max_size": "100MB",
      "received_size": "150MB"
    }
  }
  ```

  ```json Authentication Required theme={null}
  {
    "error": "Authentication required",
    "code": "UNAUTHORIZED",
    "details": {
      "message": "Bearer token missing or invalid"
    }
  }
  ```
</CodeGroup>

### Platform Dashboard Upload

1. **Visit Platform Dashboard**: Go to [https://app.prod.getclaro.ai/dashboard/](https://app.prod.getclaro.ai/dashboard/) and sign in
2. **Upload Data Sources**: Drag and drop your files or use the file picker to upload CSV, PDF, or other supported formats
3. **Review Processing**: Claro automatically extracts, cleans, and organizes your data into structured tables

## Data Cleaning Pipeline

Once uploaded, Claro automatically:

1. **Extracts** content from PDFs and organizes into structured tables
2. **Cleans** CSV data, handles missing values, and standardizes formats
3. **Validates** file structure and detects data patterns
4. **Organizes** unstructured data into consistent table formats
5. **Indexes** cleaned data for fast retrieval and selection

## Next Steps

After uploading your data sources:

1. **Review Cleaned Data**: Check the organized tables in your dashboard
2. **Create Dataset**: Select from your cleaned data sources and choose a dataset type/template
3. **Configure Dataset**: Set up the dataset for specific use cases (product catalog, supplier data, etc.)
4. **Start AI Workflows**: Begin classification, enrichment, or extraction tasks on your structured dataset

<CardGroup cols={2}>
  <Card title="Manage Data Sources" icon="folder" href="/api-reference/manage-datasource">
    View, access, and manage your uploaded data sources
  </Card>

  <Card title="Create Dataset" icon="database" href="/api-reference/create-dataset">
    Create datasets from your cleaned data sources
  </Card>
</CardGroup>
