Connectors

Connectors define how the platform connects to external data sources. Each connector type brings its own authentication method, data access patterns, and capabilities.

Available Connectors

PostgreSQL

The most mature connector with full streaming, incremental ingestion, and real-time preview.

Feature	Status
Authentication	Username/Password
Schema Browser	✅
Custom SQL Queries	✅
Preview (AI-enriched)	✅
Incremental Ingestion	✅ (change-tracking)
Scheduled Execution	✅

How it works: The platform connects to your PostgreSQL database, runs the configured query (or full table extract), streams the rows as CSV, and imports them into your data warehouse.

Google Sheets / Drive (In progress)

Google Sheets/Drive is visible in the product as In progress and is temporarily unavailable for new connections, previews, tests, and scheduled jobs.

Feature	Status
Authentication	OAuth 2.0 (Google)
Preview	In progress
Scheduled Execution	In progress
Incremental Ingestion	❌ (full extract)

Configuration fields:

Spreadsheet ID — Found in the spreadsheet URL: docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit
Range — e.g. Sheet1!A:Z or just A:Z for the default sheet

Planned behavior: The platform will use the Google Sheets API v4 to read spreadsheet data through OAuth. The first row is treated as headers, and subsequent rows are converted to CSV before loading into the warehouse.

Microsoft (Excel / SharePoint)

Connect your Microsoft 365 account via OAuth to access Excel Online files, SharePoint folders, and SharePoint lists.

Feature	Status
Authentication	OAuth 2.0 (Microsoft)
Preview	✅
Scheduled Execution	✅
Incremental Ingestion	❌ (full extract)

Three source types are supported:

Excel File

Read a specific Excel workbook from OneDrive or SharePoint.

SharePoint Folder

Ingest all CSV and Excel files from a SharePoint document library folder.

SharePoint List

Read data from a SharePoint list and convert to tabular format.

MySQL

Fully supported with schema browsing, SQL queries, previews, and scheduled executions.

Feature	Status
Authentication	Username/Password
Schema Browser	✅
Preview	✅
Incremental Ingestion	✅ (via column watermark tracking)
Scheduled Execution	✅

SQL Server

Fully supported for enterprise workloads, including SQL authentication.

Feature	Status
Authentication	Username/Password
Schema Browser	✅
Preview	✅
Incremental Ingestion	✅ (via column watermark tracking)
Scheduled Execution	✅

Inbound API (Push)

External systems push data into the platform via a dedicated webhook endpoint. Each connection gets a unique, revocable ingest token.

Feature	Status
Authentication	Ingest Token (auto-generated, 256-bit)
Direction	Push (external → platform)
Preview	✅ (last received data)
Multi-Table Support	✅ (hybrid single-request routing)
Auto-Healing Schema	✅ (zero-cost pre-filtering & AI Schema Matcher)
Scheduled Execution	N/A — event-driven / queue worker

How it works: Create an Inbound API connection, the platform generates a unique endpoint URL with an ingest token. External systems POST JSON data (single-table arrays or multi-table structures) to the endpoint.

AI API Client (Pull)

Connect to any REST API without writing code. By providing technical documentation (ReDoc, Swagger schema, JSON/TXT, or raw endpoints text) and describing in natural language what tables you want (e.g., "Pull won opportunities from ERP"), our AI Integrator orchestrates the entire bridge in an automated, secure background sandbox.

Feature	Status
Authentication	Bearer Token / API Key / Basic Auth / Custom Headers
Direction	Pull (Iara Data → External API)
Frictionless Mapping	✅ (Full analysis of manual documents & schema routes)
Active Domain Duplicate Detection	✅ (Auto-reuses valid credentials for existing host domains)
Adaptive Auth Guidance	✅ (Visual copy & paste steps based on API security checks)
Silent Background Compilation	✅ (Generated logic and trial runs happen without tech-noise)
Execution & Scheduling	✅ (Custom recurrence, schema review, type mapping & keys)

Unified No-Code Experience

To achieve an effortless cognitive flow, we consolidated connections and pipelines into a single continuous wizard:

Analyze Documentations: Insert your target API root and paste technical references (APIs manuals, ReDocs, or plain text descriptions). Our agent scans the payload security structures instantly.
Helpful Credentials Guides: Instead of guessing headers, the wizard provides tailored copy/paste steps explaining where to retrieve the API key or token.
Domain Duplicate Prevention: If you try to create a connection for an API host domain that was already configured, the platform flags the duplication block and offers a 1-Click Reuse action to borrow existing valid credentials safely.
Click Recommended Pipelines: The model maps the endpoints and suggests a deck of typical data-context cards (e.g. Customers/Clientes, Invoices, Logs). Clicking a card triggers the bridge creation.
Silent Test Run: The system compiles the connector, runs a trial query, and returns a verified data preview within seconds—completely shielding the business user from complex scripts or code approvals.
Review and Schedule: Check the final preview tables, configure standard transformations, set PII privacy masking, select keys, and set up your cron scheduler inside the review screen.

Shopify

Ingest data from your Shopify store via the Admin API.

Feature	Status
Authentication	Admin API Access Token
Supported Objects	Orders, Products, Customers, Inventory Items, Collections
Preview	✅
Pagination	✅ (cursor-based via Link header)
Incremental Ingestion	✅ (via `updated_at` change tracking)
Scheduled Execution	✅
Plan	Starter+

Configuration fields:

Shop Domain — e.g. my-store.myshopify.com
Access Token — From a Shopify custom app (Admin API)
API Version — Defaults to 2024-01

How it works: The platform calls the Shopify Admin REST API, paginates through all records using cursor-based pagination (Link header), flattens nested objects, and uploads the data as CSV to the data warehouse.

To get an access token:

Go to Shopify Admin → Settings → Apps and sales channels → Develop apps
Create a custom app and configure Admin API scopes (read_orders, read_products, etc.)
Install the app and copy the Admin API access token

Stripe

Ingest payment and billing data from Stripe.

Feature	Status
Authentication	Restricted API Key (read-only)
Supported Objects	Charges, Subscriptions, Customers, Invoices, Payouts, Disputes, Products, Prices
Preview	✅
Pagination	✅ (cursor-based via `starting_after`)
Incremental Ingestion	✅ (via `created` timestamp filter)
Scheduled Execution	✅
Plan	Starter+

Configuration fields:

API Key — Restricted key with read-only permissions

How it works: The platform calls the Stripe REST API with cursor-based pagination, flattens nested objects (metadata, address, etc.), and uploads to the data warehouse.

To get an API key:

Go to Stripe Dashboard → Developers → API keys
Create a restricted key with read-only permissions for the data you need

HubSpot

Ingest CRM data from HubSpot.

Feature	Status
Authentication	Private App Access Token
Supported Objects	Contacts, Companies, Deals, Tickets, Products, Line Items
Preview	✅
Pagination	✅ (cursor-based search API)
Incremental Ingestion	✅ (via `updatedAt`)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Access Token — From a HubSpot private app

How it works: The platform uses the HubSpot CRM v3 Search API to paginate through records. Properties are automatically flattened from the nested properties object.

To get an access token:

Go to HubSpot → Settings → Integrations → Private Apps
Create a private app with CRM object read scopes
Copy the access token

TOTVS Protheus

Ingest data from the TOTVS Protheus ERP system via its REST API.

Feature	Status
Authentication	Basic Auth / Bearer Token / API Key
Supported Entities	Customers (SA1), Products (SB1), Sales Orders (SC5), Invoices (SF2), Financials (SE1/SE2), custom
Preview	✅
Pagination	✅ (offset-based)
Field Mapping	✅ (map Protheus fields to standard names)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Base URL — e.g. https://protheus.company.com:8888
Auth Type — Basic, Bearer, or API Key
Environment / Company / Branch — Protheus-specific context headers

How it works: The platform calls the TOTVS Protheus REST API using offset pagination. Pre-configured entity endpoints (SA1, SB1, SC5, SF2, SE1, SE2) map to standard business objects. Custom endpoints can be specified for non-standard entities.

S3 / GCS (Cloud Bucket)

Ingest files from Amazon S3 or Google Cloud Storage buckets.

Feature	Status
Authentication	Access Key (S3) / Service Account (GCS)
Supported Formats	CSV, JSON, JSONL, XLSX (Excel), Parquet
Preview	✅
Multi-File Ingestion	✅
Binary Ingestion	✅ (raw buffers download preventing file corruption)
Incremental Ingestion	✅ (by file path history tracking & filename watermarks)
S3-Compatible	✅ (MinIO, DigitalOcean Spaces, etc.)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Provider — S3 or GCS
Bucket Name — Target bucket
Region — S3 region (e.g. us-east-1)
Custom Endpoint — For S3-compatible services like MinIO

How it works:

Binary-safe Download: Files are downloaded as raw buffers, fully supporting binary formats like Excel (.xlsx) and Apache Parquet (.parquet) without text-encoding corruption.
Incremental Ingestion (File Log): The platform maintains a history of ingested files (boitata_ingested_files). When scheduled, it performs an outer join check to skip files that were already successfully processed, preventing duplicate imports.
Filename Date Watermarks: If configured, the platform extracts timestamps from filename patterns to dynamically advance the job's watermark, skipping older files.
Ingestion: A _source_file column is added to track the origin of each row. Combined data is structured and loaded into Nessie/Iceberg tables.

Salesforce

Ingest CRM and business data from Salesforce using SOQL queries.

Feature	Status
Authentication	OAuth Access Token
Custom SOQL Queries	✅
Supported Objects	All standard and custom Salesforce objects
Preview	✅
Pagination	✅ (automatic via `nextRecordsUrl`)
Incremental Ingestion	✅ (via `LastModifiedDate`)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Instance URL — e.g. https://mycompany.salesforce.com
Access Token — OAuth access token
API Version — Defaults to v59.0

How it works: The platform executes SOQL queries against the Salesforce REST API. Results are automatically paginated using nextRecordsUrl. You can specify individual objects with field selection or write custom SOQL queries.

MongoDB

Ingest documents from MongoDB via the Atlas Data API.

Feature	Status
Authentication	Atlas Data API Key
Custom Queries	✅ (filter, projection, sort)
Preview	✅
Scheduled Execution	✅
Plan	Starter+

Configuration fields:

Data API URL — Atlas Data API endpoint
API Key — Data API key
Data Source — Cluster name (e.g. Cluster0)
Database — Target database

How it works: The platform uses the MongoDB Atlas Data API /action/find endpoint. Extended JSON types ($oid, $date, $numberDecimal) are automatically converted to primitive values. Arrays and nested objects are JSON-serialized into string columns.

FTP / SFTP

Connect to FTP or SFTP servers to ingest files.

Feature	Status
Authentication	Username/Password or Private Key (SFTP)
Connection Test	✅ (TCP reachability)
Full Execution	⚠️ Requires `ssh2-sftp-client` dependency
Plan	Starter+

Note: Full FTP/SFTP execution requires the ssh2-sftp-client and basic-ftp npm packages to be installed. The connection test checks TCP reachability only.

BigQuery

Run SQL queries against Google BigQuery and ingest the results.

Feature	Status
Authentication	Service Account JSON (JWT)
Custom SQL Queries	✅
Preview	✅
Schema Types	✅ (preserves BigQuery type metadata)
Scheduled Execution	✅
Plan	Business+

Configuration fields:

Project ID — GCP project ID
Service Account JSON — Service account key with BigQuery Reader role
Location — BigQuery dataset location (e.g. US, EU)

How it works: The platform signs a JWT using the service account key, exchanges it for an access token, then executes the SQL query via the BigQuery REST API. Schema metadata is preserved (field types from BigQuery schema).

Snowflake

Run SQL queries against Snowflake and ingest the results.

Feature	Status
Authentication	Username/Password or Key-Pair
Custom SQL Queries	✅
Preview	✅
Scheduled Execution	✅
Plan	Business+

Configuration fields:

Account — Snowflake account identifier (e.g. xy12345.us-east-1)
Username / Password — Snowflake credentials
Warehouse — Compute warehouse
Database / Schema / Role — Default context

How it works: The platform submits SQL queries via the Snowflake SQL REST API (/api/v2/statements). Results are returned synchronously for small queries. Column names and data types are extracted from resultSetMetaData.

SAP (OData)

Ingest data from SAP systems via OData services.

Feature	Status
Authentication	Basic Auth / OAuth / API Key
OData Queries	✅ ($select, $filter, $expand)
Preview	✅
Pagination	✅ ($skip/$top with inline count)
Scheduled Execution	✅
Plan	Business+

Configuration fields:

OData Service URL — SAP Gateway service URL
Auth Type — Basic, OAuth, or API Key
SAP Client — (optional) Client number (e.g. 100)

How it works: The platform calls the SAP OData service with $skip/$top pagination, using $inlinecount=allpages to determine total record count. Metadata objects (__metadata, __deferred) are automatically stripped from results.

Kafka

Consume messages from Kafka topics via the Confluent REST Proxy.

Feature	Status
Authentication	SASL (PLAIN / SCRAM)
Protocol	Confluent REST Proxy (v2)
Preview	✅
Offset Control	✅ (earliest / latest)
Scheduled Execution	✅
Plan	Business+

Configuration fields:

REST Proxy URL — Confluent REST Proxy endpoint
Schema Registry URL — (optional) For Avro/Protobuf deserialization
SASL Credentials — (optional) Username/password

How it works: The platform creates a consumer instance via the REST Proxy, subscribes to the specified topic, consumes messages, and then cleans up the consumer. Message values are flattened from JSON; metadata (topic, partition, offset, key, timestamp) is preserved.

Note: For native Kafka connections (without REST Proxy), install kafkajs (coming soon).

Notion

Ingest data from Notion databases and pages.

Feature	Status
Authentication	Integration Token
Supported Objects	Databases, Pages
Preview	✅
Pagination	✅ (cursor-based)
Incremental Ingestion	✅ (via `last_edited_time`)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Integration Token — From a Notion integration (secret_...)

How it works: The platform queries Notion databases using the API's /databases/{id}/query endpoint. All 18+ Notion property types (title, rich_text, number, select, multi_select, date, checkbox, URL, email, phone, formula, relation, rollup, people, files, created_time, last_edited_time, status) are automatically flattened to scalar values.

To get an integration token:

Go to notion.so/my-integrations
Create a new integration
Share the target database with the integration

Slack

Export messages, users, and channels from Slack workspaces.

Feature	Status
Authentication	Bot User OAuth Token (`xoxb-...`)
Supported Data Types	Messages, Users, Channels
Preview	✅
Pagination	✅ (cursor-based)
Incremental Ingestion	✅ (via message `ts` timestamp)
Scheduled Execution	✅
Plan	Growth+

Configuration fields:

Bot Token — Slack Bot User OAuth Token (xoxb-...)

Data types:

Messages — Channel message history with reactions, threads, attachments
Users — Workspace members with profile info, email, status
Channels — Public and private channels with topic, purpose, member count

To get a bot token:

Go to api.slack.com/apps and create a new app
Add Bot Token Scopes: channels:history, channels:read, users:read, users:read.email
Install the app to your workspace and copy the Bot User OAuth Token

Connector Architecture

All connectors follow the same pipeline pattern:

External Source → Connector Engine → Data Processing → Iara Data Warehouse

Connection stores encrypted credentials (OAuth tokens or passwords)
Job defines what to extract (query, spreadsheet, folder) and the schedule
Execution engine routes to the appropriate connector
Data is always normalized before being imported into your data warehouse
The platform handles schema inference, table creation, and data loading

Pricing Tiers

Tier	Connectors Included
Starter	PostgreSQL, MySQL, SQL Server, Microsoft Excel, File Upload, Inbound API, AI API Client, Shopify, Stripe, MongoDB, FTP/SFTP
Growth	+ HubSpot, TOTVS Protheus, S3/GCS, Salesforce, Notion, Slack
Business	+ BigQuery, Snowflake, SAP, Kafka

Available Connectors​

PostgreSQL​

Google Sheets / Drive (In progress)​

Microsoft (Excel / SharePoint)​

Excel File​

SharePoint Folder​

SharePoint List​

MySQL​

SQL Server​

Inbound API (Push)​

AI API Client (Pull)​

Unified No-Code Experience​

Shopify​

Stripe​

HubSpot​

TOTVS Protheus​

S3 / GCS (Cloud Bucket)​

Salesforce​

MongoDB​

FTP / SFTP​

BigQuery​

Snowflake​

SAP (OData)​

Kafka​

Notion​

Slack​

Connector Architecture​

Pricing Tiers​

Available Connectors

PostgreSQL

Google Sheets / Drive (In progress)

Microsoft (Excel / SharePoint)

Excel File

SharePoint Folder

SharePoint List

MySQL

SQL Server

Inbound API (Push)

AI API Client (Pull)

Unified No-Code Experience

Shopify

Stripe

HubSpot

TOTVS Protheus

S3 / GCS (Cloud Bucket)

Salesforce

MongoDB

FTP / SFTP

BigQuery

Snowflake

SAP (OData)

Kafka

Notion

Slack

Connector Architecture

Pricing Tiers