Ingestion API
For programmatic data ingestion, Iara Data provides two secure, high-performance mechanisms to ingest data directly into your warehouse:
- Webhook Inbound Push API — Event-driven, auto-healing schema evolution, and multi-table support.
- Standard Metrics API — Programmatic ingestion for metrics and structured logs.
⚡ Webhook Inbound Push API
The Inbound Webhook API allows external applications or webhooks (e.g., Stripe, Shopify, custom CRMs) to push raw JSON payloads directly into the platform.
Endpoint
POST /v1/ingest/push/{ingestToken}
- Authentication: The
ingestTokenis a unique, secure token generated during connection setup. - Tenant Isolation: Must include the
x-tenant-idheader to route the request to the correct tenant sandbox.
Payload Formats
You can ingest data in three different structures, depending on your architecture:
1. Single Table Payload (Explicit Target)
Send a flat array of objects. The target table must be specified via the ?table=your_table_name query parameter or the x-target-table header.
[
{ "id": 1, "phone": "11999999999", "new_col_x": "value" },
{ "id": 2, "phone": "11988888888", "new_col_x": "other" }
]
2. Envelope Payload (Explicit Target)
Send a single JSON object containing a rows array.
{
"table": "users_logs",
"rows": [
{ "id": 1, "email": "test@example.com", "action": "login" }
]
}
3. Hybrid Multi-Table Payload (Dynamic Ingestion)
To maximize performance and combine payloads, you can push data for multiple tables in a single request. The platform automatically routes each array to its corresponding physical table:
{
"test_tbl_orders": [
{ "order_id": 101, "total": 150.50, "status": "completed" }
],
"test_tbl_users": [
{ "user_id": 1, "phone": "11999999999", "name": "John Doe" }
]
}
Auto-Bootstrap: If a table in a multi-table payload is not yet registered in the platform, a pipeline job is automatically created on-the-fly with flexible ingestion settings.
🧠 Self-Learning & Auto-Healing Schema
When new fields are pushed to a dataset, the Inbound API automatically resolves mismatches without breaking ingestion or failing jobs.
1. Zero-Cost Offline Pre-filtering
To optimize token usage and cost, the platform implements deterministic checks before invoking the AI model:
- Full Coverage (Bypass IA): If all officially registered columns are present in the payload, any additional fields are treated as brand new physical columns and evolved in the schema.
- Deterministic Typos (Bypass IA): If there are 1 or 2 new columns, the engine calculates the Levenshtein distance against empty/unfilled registered columns. If the distance is $\le 2$ and the types are compatible (e.g., destination is
varchar/text), the engine maps the typo to the canonical column offline (e.g.,phnoe$\rightarrow$phone) and saves it as an alias. - AI Schema Matcher fallback: If offline matches fail or there are $>2$ unresolved columns, the request goes to the AI Matcher.
2. AI Schema Matcher
The system invokes the configured model to map semantic synonyms:
- For example,
numero_movelorcelularare automatically identified as synonyms forphone. - The resolved name is saved as an alias in the database, and future payloads are normalized automatically.
- Unrecognized semantic columns (e.g.
birth_placemapping tonull) trigger a physical DDLALTER TABLE ADD COLUMNin Trino.
3. Asynchronous Catalog Enrichment & PII
Whenever a physical schema evolves:
- The platform triggers an asynchronous background worker task via the integrated AI model.
- It evaluates descriptions, tags, and automatically flags sensitive fields as PII (Personally Identifiable Information).
- Auto-Hashing: Any column flagged as PII is automatically hashed using SHA-256 upon ingestion to protect data privacy.
📊 Standard Metrics API
Used for direct structural inserts of system logs or timeseries metrics.
Endpoint
POST /v1/ingest/metrics
- Authentication: Include your API key in the
Authorization: Bearer YOUR_API_KEYheader.
Request Body
{
"table": "my_metrics",
"rows": [
{
"date": "2026-06-12",
"metric_name": "cpu_usage",
"value": 45.2,
"region": "us-east"
}
]
}
⚙️ Limits, Idempotency & Security
Idempotency
Always include an Idempotency-Key header on POST requests. The system caches responses for 24 hours to prevent duplicate records in case of network retries.
Rate Limits
Rate limits are configured per connection (e.g., up to 300 requests/minute). Custom IP ranges (Allowed IPs) can be whitelisted to restrict webhook execution.
DoS Protection
The platform routes all raw pushes to a highly isolated PG-backed queue (boitata_inbound_queue). The background pipeline worker processes these messages in micro-batches every 3 seconds, shielding the analytical engine (Trino/Iceberg) from concurrent request spikes.