> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getclaro.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Tasks Overview

> The AI capabilities behind Claro's operations — classification, generation, web enrichment, geolocation, and document extraction.

AI Tasks are the building blocks behind Claro's operations and Research Agents. They're rarely invoked directly anymore — Operations like **Bulk Enrichment** and **Generative Engine** wrap them in catalogue-aware workflows with confidence, provenance, and review built in. This page documents each capability so you can choose the right one when configuring an operation or a Research Agent run.

***

## Capabilities

* Classification
* Generation
* Web Enrichment
* Geolocation Enrichment
* File Extraction
* Advanced OCR

### Model selection

Most tasks support multiple models. Claro's proprietary model is the default — it generates confidence scores and citations alongside outputs, which makes it the right choice when a task feeds back into a catalogue. Other models are available for specialized use cases with variable pricing.

### Credit consumption

Credits are consumed per record. Cost depends on the model and the task type. Failed tasks consume zero credits.

***

### Classification

Categorize and tag records with consistent labels.

**How it works** — compares each record against your defined categories or asks the model to suggest classifications based on patterns in your data.

**Used by** — Bulk Enrichment with a target enum or reference attribute, Generate Taxonomy proposal review, Validate Data rule routing.

**Tips**

* Works best with ≤ 50 categories per task; for larger label sets use Generate Taxonomy first to build hierarchy.
* Provide 2–3 examples per category in the prompt.
* Adjust the confidence threshold per attribute, not globally.

### Generation

Create or rewrite text content tailored to your specifications.

**How it works** — language models produce new content based on the prompt and other attributes on the record.

**Used by** — Generative Engine (descriptions, marketing copy, alt-text, translations), SEO Report (suggestions).

**Tips**

* Specify exact length and format constraints.
* Reference other attributes with `@attribute_name` to ground generation.
* Attach Knowledge Bases for brand voice and category consistency.

### Web Enrichment

Research live information from across the web to enhance records.

**How it works** — searches the web for relevant information, extracts key details, and provides citations.

**Used by** — Bulk Enrichment with web search as the source.

**Tips**

* Cannot access paywalled or private content.
* Quality scales with how well-known the entity is.
* Always keep citations on; review at least the bottom 10% by confidence before raising auto-apply.

### Geolocation Enrichment

Transform addresses into rich geographic data and nearby points of interest.

**How it works** — converts addresses to coordinates and optionally finds surrounding businesses and amenities.

**Used by** — Bulk Enrichment for address-bearing catalogues, the *Find your perfect list* Research Agent.

**Tips**

* Less accurate for rural or remote locations.
* P.O. boxes cannot be geocoded.
* Use a tight radius (1–2 km) in dense urban areas.

### File Extraction

Extract structured data from PDFs, invoices, forms, and documents — without templates.

**How it works** — AI-powered document parser identifies and extracts specific fields against a target schema.

**Used by** — *Turn documents into structured data* Research Agent, Bulk Enrichment when the source is a Knowledge Base of supplier datasheets.

**Tips**

* Specify the exact fields you need.
* Group related documents into one Knowledge Base for re-use across many extractions.
* Use Advanced OCR for low-quality scans, handwriting, or complex layouts.

### Advanced OCR

Extract text from low-quality, handwritten, or complex visual documents.

**How it works** — vision-AI pipeline optimized for documents that standard OCR cannot handle.

**Used by** — File Extraction as a fallback, *Turn documents into structured data* with the *advanced OCR* option enabled.

**Tips**

* Try standard File Extraction first — it's faster and cheaper.
* Improve scan quality before extraction when you can.
* Break very long documents into sections.

***

## Setting up an AI task inside an operation

When you configure **Bulk Enrichment** or **Generative Engine**, the same building blocks appear:

* **Prompt** — reference attributes with `@attribute_name`. Be specific about format and length.
* **Knowledge Bases** — attach domain documents for grounding and citation.
* **Model** — choose Claro default for confidence + citations, or another model for specialized cases.
* **Confidence thresholds** — auto-apply, review, reject. Defaults are conservative.
* **Output format** — text, structured (per the target attribute's type), or multi-value.

***

## Where AI tasks fit in the platform

| Surface                  | How AI tasks are used                                                                                                                           |
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| **Catalogue Operations** | Wrapped in operations (Bulk Enrichment, Generative Engine, Validate, Normalize, Find Duplicates, etc.) with confidence, provenance, and review. |
| **Research Agents**      | Wrapped in agent flows for one-off datasets — list-building, document parsing, scraping, spreadsheet enrichment.                                |
| **Knowledge Bases**      | Attached to operations and agents to ground outputs in domain documents.                                                                        |

For most users the right entry points are [Operations](/operations) and [Research Agents](/research_agents). This page is the underlying reference.
