> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Manage Dataset

> Manage your datasets with full CRUD operations. List, view details, access data, update cells, manage rows and columns, and handle task types.

<Note>
  All operations require authentication using Bearer tokens. Make sure you have
  your API credentials ready.
</Note>

## List All Datasets

Retrieve a paginated list of all your datasets.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X GET "https://secure-api.getclaro.ai/api/v2/datasets?page=1&limit=20" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  params = {"page": 1, "limit": 20}

  response = requests.get(
      "https://secure-api.getclaro.ai/api/v2/datasets",
      headers=headers,
      params=params
  )
  ```

  ```javascript JavaScript theme={null}
  const params = new URLSearchParams({
    page: 1,
    limit: 20,
  });

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets?${params}`,
    {
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "datasets": [
      {
        "datasetId": "550e8400-e29b-41d4-a716-446655440000",
        "name": "Product Analysis Dataset",
        "description": "Analysis of product data with AI-generated insights",
        "type": "extraction",
        "status": "completed",
        "createdAt": "2024-03-14T15:30:00Z",
        "updatedAt": "2024-03-14T16:45:00Z",
        "rowCount": 150,
        "columnCount": 8,
        "linkedDatasources": ["datasource-1", "datasource-2"]
      },
      {
        "datasetId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "name": "Customer Feedback Analysis",
        "description": "Sentiment analysis of customer reviews",
        "type": "analysis",
        "status": "processing",
        "createdAt": "2024-03-14T16:00:00Z",
        "updatedAt": "2024-03-14T16:30:00Z",
        "rowCount": 500,
        "columnCount": 6,
        "linkedDatasources": ["datasource-3"]
      }
    ],
    "pagination": {
      "page": 1,
      "limit": 20,
      "total": 12,
      "totalPages": 1
    }
  }
  ```
</CodeGroup>

## Get Dataset Details

Retrieve detailed information about a specific dataset including column metadata and task types.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X GET "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  dataset_id = "your-dataset-id"  # Replace with your dataset ID

  response = requests.get(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}",
      headers=headers
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id"; // Replace with your dataset ID

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}`,
    {
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "datasetId": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Product Analysis Dataset",
    "description": "Analysis of product data with AI-generated insights",
    "type": "extraction",
    "status": "completed",
    "createdAt": "2024-03-14T15:30:00Z",
    "updatedAt": "2024-03-14T16:45:00Z",
    "columns": [
      {
        "columnId": "col_1",
        "name": "product_name",
        "type": "string",
        "taskType": "doc_extraction",
        "isRawData": false,
        "metadata": {}
      },
      {
        "columnId": "col_2",
        "name": "raw_description",
        "type": "text",
        "taskType": "raw",
        "isRawData": true,
        "metadata": {}
      },
      {
        "columnId": "col_3",
        "name": "enriched_description",
        "type": "text",
        "taskType": "ai_enrich",
        "isRawData": false,
        "metadata": {
          "prompt": "Create a marketing description based on the product name and features"
        }
      }
    ],
    "rowCount": 150,
    "columnCount": 8,
    "linkedDatasources": ["datasource-1", "datasource-2"]
  }
  ```
</CodeGroup>

## Get Dataset Data

Retrieve paginated data from a specific dataset as a 2D array with column information.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X GET "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/data?page=1&limit=50" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  params = {"page": 1, "limit": 50}
  dataset_id = "your-dataset-id"  # Replace with your dataset ID

  response = requests.get(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/data",
      headers=headers,
      params=params
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id"; // Replace with your dataset ID
  const params = new URLSearchParams({
    page: 1,
    limit: 50,
  });

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/data?${params}`,
    {
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "rows": [
      [
        {
          "cellId": "cell_1_1",
          "rowId": "row_1",
          "columnId": "col_1",
          "value": "Widget A",
          "metadata": {
            "confidence": 0.95,
            "extractedAt": "2024-03-14T15:30:00Z"
          }
        },
        {
          "cellId": "cell_1_2",
          "rowId": "row_1",
          "columnId": "col_2",
          "value": "High-quality electronic widget",
          "linkedDatasource": "datasource-1",
          "metadata": {
            "sourceDocument": "product_catalog_page_1.pdf",
            "pageNumber": 1
          }
        },
        {
          "cellId": "cell_1_3",
          "rowId": "row_1",
          "columnId": "col_3",
          "value": 29.99,
          "metadata": {
            "confidence": 0.92
          }
        },
        {
          "cellId": "cell_1_4",
          "rowId": "row_1",
          "columnId": "col_4",
          "value": "Electronics",
          "metadata": {
            "confidence": 0.88
          }
        }
      ],
      [
        {
          "cellId": "cell_2_1",
          "rowId": "row_2",
          "columnId": "col_1",
          "value": "Widget B",
          "metadata": {
            "confidence": 0.93
          }
        },
        {
          "cellId": "cell_2_2",
          "rowId": "row_2",
          "columnId": "col_2",
          "value": "Advanced electronic component",
          "linkedDatasource": "datasource-1",
          "metadata": {
            "sourceDocument": "product_catalog_page_1.pdf",
            "pageNumber": 1
          }
        },
        {
          "cellId": "cell_2_3",
          "rowId": "row_2",
          "columnId": "col_3",
          "value": 39.99,
          "metadata": {
            "confidence": 0.9
          }
        },
        {
          "cellId": "cell_2_4",
          "rowId": "row_2",
          "columnId": "col_4",
          "value": "Electronics",
          "metadata": {
            "confidence": 0.85
          }
        }
      ]
    ],
    "columns": [
      {
        "columnId": "col_1",
        "name": "product_name",
        "type": "string",
        "taskType": "doc_extraction"
      },
      {
        "columnId": "col_2",
        "name": "description",
        "type": "text",
        "taskType": "raw"
      },
      {
        "columnId": "col_3",
        "name": "price",
        "type": "number",
        "taskType": "doc_extraction"
      },
      {
        "columnId": "col_4",
        "name": "category",
        "type": "string",
        "taskType": "classification"
      }
    ],
    "pagination": {
      "page": 1,
      "limit": 50,
      "total": 150,
      "totalPages": 3
    }
  }
  ```
</CodeGroup>

## Update Cell Value

Update the value of a specific cell using its cell ID. When a cell is edited, its metadata will be reset.

<Note>
  Cell metadata cannot be updated directly by users and will be reset when the
  cell value is changed.
</Note>

<CodeGroup>
  ```bash cURL theme={null}
  curl -X PATCH "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/cells/$CELL_ID" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "value": "Updated Widget Name"
    }'
  ```

  ```python Python theme={null}
  import requests

  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "value": "Updated Widget Name"
  }
  dataset_id = "your-dataset-id"
  cell_id = "cell_1_1"  # Format: col_row

  response = requests.patch(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/cells/{cell_id}",
      headers=headers,
      json=data
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";
  const cellId = "cell_1_1"; // Format: col_row

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/cells/${cellId}`,
    {
      method: "PATCH",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        value: "Updated Widget Name",
      }),
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "cellId": "cell_1_1",
    "value": "Updated Widget Name",
    "previousValue": "Widget A",
    "metadataReset": true,
    "updatedAt": "2024-03-14T17:30:00Z",
    "updatedBy": "user"
  }
  ```
</CodeGroup>

## Update Column Metadata

Update column information and metadata without changing the task type.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X PATCH "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/columns/$COLUMN_ID" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "name": "updated_product_name",
      "metadata": {
        "extractionPrompt": "Extract the full product name including brand",
        "confidence": 0.95
      }
    }'
  ```

  ```python Python theme={null}
  import requests

  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "name": "updated_product_name",
      "metadata": {
          "extractionPrompt": "Extract the full product name including brand",
          "confidence": 0.95
      }
  }
  dataset_id = "your-dataset-id"
  column_id = "col_1"

  response = requests.patch(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/columns/{column_id}",
      headers=headers,
      json=data
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";
  const columnId = "col_1";

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/columns/${columnId}`,
    {
      method: "PATCH",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        name: "updated_product_name",
        metadata: {
          extractionPrompt: "Extract the full product name including brand",
          confidence: 0.95,
        },
      }),
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "columnId": "col_1",
    "name": "updated_product_name",
    "type": "string",
    "taskType": "extraction",
    "metadata": {
      "extractionPrompt": "Extract the full product name including brand",
      "confidence": 0.95
    },
    "updatedAt": "2024-03-14T17:30:00Z"
  }
  ```
</CodeGroup>

## Update Column Task Type

<Warning>
  This operation will reset all values in the specified column. This action
  cannot be undone.
</Warning>

Update the task type of a column, which will reset all values in that column.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X PUT "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/columns/$COLUMN_ID/task-type" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "taskType": "ai_enrich",
      "metadata": {
        "prompt": "Generate a detailed product description based on the product name and category"
      }
    }'
  ```

  ```python Python theme={null}
  import requests

  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "taskType": "ai_enrich",
      "metadata": {
          "prompt": "Generate a detailed product description based on the product name and category"
      }
  }
  dataset_id = "your-dataset-id"
  column_id = "col_1"

  response = requests.put(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/columns/{column_id}/task-type",
      headers=headers,
      json=data
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";
  const columnId = "col_1";

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/columns/${columnId}/task-type`,
    {
      method: "PUT",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        taskType: "ai_enrich",
        metadata: {
          prompt:
            "Generate a detailed product description based on the product name and category",
        },
      }),
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "columnId": "col_1",
    "taskType": "ai_enrich",
    "previousTaskType": "extraction",
    "valuesReset": true,
    "affectedCells": 150,
    "metadata": {
      "prompt": "Generate a detailed product description based on the product name and category"
    },
    "updatedAt": "2024-03-14T17:30:00Z"
  }
  ```
</CodeGroup>

## Link Datasource to Cell

For extraction datasets, upload or link a new datasource to a specific cell in raw data columns.

### Link Existing Datasource

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/cells/$CELL_ID/link-datasource" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "datasourceId": "existing-datasource-id"
    }'
  ```

  ```python Python theme={null}
  import requests

  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "datasourceId": "existing-datasource-id"
  }
  dataset_id = "your-dataset-id"
  cell_id = "cell_2_1"  # Must be in a raw data column

  response = requests.post(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/cells/{cell_id}/link-datasource",
      headers=headers,
      json=data
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";
  const cellId = "cell_2_1"; // Must be in a raw data column

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/cells/${cellId}/link-datasource`,
    {
      method: "POST",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        datasourceId: "existing-datasource-id",
      }),
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "cellId": "cell_2_1",
    "datasourceId": "existing-datasource-id",
    "linkedAt": "2024-03-14T17:30:00Z",
    "previousDatasource": null
  }
  ```
</CodeGroup>

### Upload and Link New File

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/cells/$CELL_ID/upload-datasource" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -F "file=@document.pdf"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  files = {"file": open("document.pdf", "rb")}
  dataset_id = "your-dataset-id"
  cell_id = "cell_2_1"  # Must be in a raw data column

  response = requests.post(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/cells/{cell_id}/upload-datasource",
      headers=headers,
      files=files
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";
  const cellId = "cell_2_1"; // Must be in a raw data column
  const fileInput = document.getElementById("fileInput");
  const formData = new FormData();

  formData.append("file", fileInput.files[0]);

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/cells/${cellId}/upload-datasource`,
    {
      method: "POST",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
      body: formData,
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "cellId": "cell_2_1",
    "datasourceId": "newly-created-datasource-id",
    "fileName": "document.pdf",
    "fileSize": 245760,
    "uploadedAt": "2024-03-14T17:30:00Z",
    "linkedAt": "2024-03-14T17:30:00Z"
  }
  ```
</CodeGroup>

## Delete Dataset

Permanently delete a dataset and all its associated data.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X DELETE "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID" \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  headers = {"Authorization": "Bearer YOUR_API_KEY"}
  dataset_id = "your-dataset-id"  # Replace with your dataset ID

  response = requests.delete(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}",
      headers=headers
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id"; // Replace with your dataset ID

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}`,
    {
      method: "DELETE",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
      },
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "message": "Dataset deleted successfully",
    "datasetId": "550e8400-e29b-41d4-a716-446655440000",
    "deletedAt": "2024-03-14T17:45:00Z"
  }
  ```
</CodeGroup>

## Export Dataset

Generate a download URL for the dataset in various formats.

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/$DATASET_ID/export" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "format": "csv",
      "rowCount": 1000,
      "columnIds": ["col_1", "col_3", "col_4"],
      "skip": 0,
      "includeMetadata": true,
      "expiresIn": 3600
    }'
  ```

  ```python Python theme={null}
  import requests

  headers = {
      "Authorization": "Bearer YOUR_API_KEY",
      "Content-Type": "application/json"
  }
  data = {
      "format": "csv",
      "rowCount": 1000,
      "columnIds": ["col_1", "col_3", "col_4"],
      "skip": 0,
      "includeMetadata": True,
      "expiresIn": 3600
  }
  dataset_id = "your-dataset-id"

  response = requests.post(
      f"https://secure-api.getclaro.ai/api/v2/datasets/{dataset_id}/export",
      headers=headers,
      json=data
  )
  ```

  ```javascript JavaScript theme={null}
  const datasetId = "your-dataset-id";

  const response = await fetch(
    `https://secure-api.getclaro.ai/api/v2/datasets/${datasetId}/export`,
    {
      method: "POST",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        format: "csv",
        rowCount: 1000,
        columnIds: ["col_1", "col_3", "col_4"],
        skip: 0,
        includeMetadata: true,
        expiresIn: 3600,
      }),
    }
  );
  ```

  ```json Success Response theme={null}
  {
    "downloadUrl": "https://secure-api.getclaro.ai/export/550e8400-e29b-41d4-a716-446655440000.csv?token=xyz789...",
    "fileName": "product_analysis_dataset.csv",
    "format": "csv",
    "expiresAt": "2024-03-14T18:45:00Z",
    "fileSize": 125440
  }
  ```
</CodeGroup>

## Query Parameters

### List Datasets

* `page` (integer): Page number (default: 1)
* `limit` (integer): Items per page (default: 20, max: 100)
* `type` (string): Filter by dataset type (`extraction`, `analysis`, `classification`)
* `status` (string): Filter by status (`processing`, `completed`, `failed`)

### Get Dataset Data

* `page` (integer): Page number (default: 1)
* `limit` (integer): Rows per page (default: 50, max: 1000)
* `columns` (string): Comma-separated column IDs to include
* `includeMetadata` (boolean): Include row/column metadata (default: false)

### Export Dataset

* `format` (string): Export format (`csv`, `json`, `xlsx`)
* `rowCount` (integer): Number of rows to export (required, max: 100000)
* `columnIds` (array): Array of column IDs to include (optional, includes all if omitted)
* `skip` (integer): Number of initial rows to skip (optional, default: 0)
* `includeMetadata` (boolean): Include metadata in export
* `expiresIn` (integer): URL expiration time in seconds (default: 3600, max: 86400)

## Task Types

| Task Type        | Description                                                                                  | Dataset Requirement | Metadata Requirements |
| ---------------- | -------------------------------------------------------------------------------------------- | ------------------- | --------------------- |
| `raw`            | Original unprocessed data from datasources or user input                                     | Any                 | None                  |
| `doc_extraction` | Extract information from corresponding cell in same row                                      | Extraction only     | None                  |
| `web_enrich`     | Response from web scraping to enrich data                                                    | Any                 | None                  |
| `classification` | Yes/no or categorical classification of data in same row                                     | Any                 | None                  |
| `ai_enrich`      | AI-generated content based on row data using custom prompts                                  | Any                 | `prompt` required     |
| `places_count`   | Extract places count from maps using coordinates or nearby area with radius from other cells | Any                 | `prompt` required     |

## Error Codes

| Code                     | Description                        |
| ------------------------ | ---------------------------------- |
| `UNAUTHORIZED`           | Authentication required            |
| `DATASET_NOT_FOUND`      | Dataset doesn't exist              |
| `CELL_NOT_FOUND`         | Cell ID doesn't exist              |
| `COLUMN_NOT_FOUND`       | Column ID doesn't exist            |
| `ROW_NOT_FOUND`          | Row ID doesn't exist               |
| `INVALID_TASK_TYPE`      | Task type not supported            |
| `RAW_COLUMN_REQUIRED`    | Operation requires raw data column |
| `PROCESSING_IN_PROGRESS` | Dataset still being processed      |
| `INVALID_PARAMETERS`     | Invalid query parameters           |
| `ACCESS_DENIED`          | Insufficient permissions           |

## Next Steps

<CardGroup cols={2}>
  <Card title="Dataset Tasks" icon="tasks" href="/api-reference/tasks">
    Run AI tasks on your dataset for processing and analysis
  </Card>

  <Card title="Create Dataset" icon="database" href="/api-reference/create-dataset">
    Create new datasets from your data sources
  </Card>
</CardGroup>
