Skip to main content
All operations require authentication using Bearer tokens. Make sure you have your API credentials ready.

Standard Dataset Creation

Create Dataset from Data Sources

Create a new dataset with specified configuration using existing data sources or file uploads.
curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "data_enrichment",
    "name": "Product Enrichment Dataset",
    "description": "Enrich product data with additional attributes and classifications",
    "datasourceId": "$DATASOURCE_ID"
  }'

Create Dataset with File Upload

Create a dataset by uploading files directly instead of using existing data sources.
Files uploaded during dataset creation are automatically processed and saved as data sources, making them available for creating additional datasets in the future.
# For CSV files (single file)
curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/upload" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "type=data_enrichment" \
  -F "name=Product Enrichment Dataset" \
  -F "description=Enrich product data with additional attributes" \
  -F "file=@products.csv"

# For PDF files (multiple files)

curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/upload" \
 -H "Authorization: Bearer YOUR_API_KEY" \
 -F "type=data_extraction" \
 -F "name=Invoice Extraction Dataset" \
 -F "description=Extract key fields from invoice documents" \
 -F "files[]=@invoice1.pdf" \
 -F "files[]=@invoice2.pdf"

Dataset Types and Configuration

Data Enrichment

Enhance existing data with additional attributes and classifications. Requires data source.
{
  "type": "data_enrichment",
  "name": "Product Classification Dataset",
  "description": "Classify products and add missing attributes for e-commerce catalog",
  "datasourceId": "your-datasource-id"
}

Data Extraction

Extract structured data from unstructured sources like PDFs. Requires data sources.
{
  "type": "data_extraction",
  "name": "Invoice Data Extraction",
  "description": "Extract key fields from invoice documents for automated processing",
  "datasourceId": "your-datasource-id"
}

Map Extraction

Extract location-based data within specified geographic boundaries. No data sources required.
{
  "type": "map_extraction",
  "name": "Restaurant Location Data",
  "description": "Find restaurants in downtown area for market analysis",
  "mapDetails": {
    "latitude": 40.7128,
    "longitude": -74.006,
    "radiusMeters": 5000
  }
}

Custom Dataset (Blank Table)

Create a blank structured table with custom column definitions. No data sources required.
{
  "type": "custom_dataset",
  "name": "Customer Survey Responses",
  "description": "Collect and organize customer feedback for satisfaction analysis",
  "columnDefinitions": [
    {
      "name": "customer_id",
      "type": "string",
      "description": "Unique customer identifier"
    },
    {
      "name": "satisfaction_score",
      "type": "number",
      "description": "Rating from 1-10"
    },
    {
      "name": "feedback_text",
      "type": "text",
      "description": "Open-ended feedback"
    },
    {
      "name": "survey_date",
      "type": "date",
      "description": "Date survey was completed"
    }
  ]
}

AI-Powered Dataset Generation

Generate Sample Dataset with AI

Generate a sample dataset using AI with a prompt-based approach. This returns a preview for user confirmation before creating the actual dataset.
curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/generate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Create a dataset of tech startup companies with funding information",
    "sampleSize": 10
  }'

Refine AI Dataset Generation (Optional)

After reviewing the sample dataset, you can optionally request corrections or additions before confirming the final dataset.
curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/generate/$DATASET_REQUEST_ID/refine" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Add a column for company valuation and remove the location column"
  }'

Confirm AI Dataset Generation

After reviewing the sample dataset (and optionally refining it), confirm creation of the full AI-generated dataset using the dataset request ID.
curl -X POST "https://secure-api.getclaro.ai/api/v2/datasets/ai-generate-confirm" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "datasetRequestId": "$DATASET_REQUEST_ID",
    "fullSize": 1000
  }'

Request Parameters

Create Dataset

ParameterTypeRequiredDescription
typestringYesDataset type: data_enrichment, data_extraction, map_extraction, custom_dataset
namestringYesDataset name (max 100 characters)
descriptionstringYesPurpose and use case description. Used as prompt for enrichment/extraction, search text for maps
datasourceIdstringConditionalDatasource ID. Required for data_enrichment, data_extraction. Not used for map_extraction
mapDetailsobjectConditionalRequired for map_extraction type
mapDetails.latitudenumberConditionalCenter latitude for map extraction
mapDetails.longitudenumberConditionalCenter longitude for map extraction
mapDetails.radiusMetersnumberConditionalExtraction radius in meters (max 50000)
columnDefinitionsarrayConditionalRequired for custom_dataset type

Create Dataset with File Upload

ParameterTypeRequiredDescription
typestringYesDataset type (same as above)
namestringYesDataset name
descriptionstringYesPurpose and use case description
filefileConditionalSingle CSV file upload
files[]filesConditionalMultiple PDF files upload
mapDetailsobjectConditionalRequired for map_extraction type
columnDefinitionsarrayConditionalRequired for custom_dataset type

AI Generate Sample Dataset

ParameterTypeRequiredDescription
promptstringYesNatural language description of desired dataset
sampleSizenumberNoNumber of sample rows (default: 10, max: 50). Cannot be changed in refinement or confirmation steps

Refine AI Dataset Generation

ParameterTypeRequiredDescription
idstringYesDataset request ID from generate response (in URL path)
promptstringYesNatural language description of corrections or additions needed

Confirm AI Dataset Generation

ParameterTypeRequiredDescription
datasetRequestIdstringYesRequest ID from generate or generate//refine response
fullSizenumberNoDesired full dataset size (default: 1000, max: 100000)

Column Definition Schema

For custom datasets, define columns with the following structure:
{
  "name": "column_name",
  "type": "string|number|date|boolean|text",
  "description": "Column purpose and content description",
  "required": true,
  "defaultValue": "optional_default"
}

Column Types

TypeDescriptionExample Use Cases
stringShort text values (typically < 255 characters)Names, IDs, categories, status
textLong text content (unlimited length)Descriptions, comments, articles
numberNumeric values (integers and decimals)Prices, quantities, scores
dateDate and timestamp valuesCreated dates, deadlines
booleanTrue/false valuesActive status, feature flags

Error Codes

CodeDescription
VALIDATION_ERRORInvalid request parameters
DATASOURCE_NOT_FOUNDReferenced datasource doesn’t exist
GENERATION_FAILEDAI dataset generation failed
REFINEMENT_FAILEDAI dataset refinement failed
REQUEST_EXPIREDDataset request ID expired
QUOTA_EXCEEDEDDataset creation limit reached
INVALID_MAP_BOUNDSMap extraction coordinates invalid
FILE_UPLOAD_ERRORFile upload failed or invalid format

Next Steps