Connectors
Connectors define how the platform connects to external data sources. Each connector type brings its own authentication method, data access patterns, and capabilities.
Available Connectors
PostgreSQL
The most mature connector with full streaming, incremental ingestion, and real-time preview.
| Feature | Status |
|---|---|
| Authentication | Username/Password |
| Schema Browser | ✅ |
| Custom SQL Queries | ✅ |
| Preview (AI-enriched) | ✅ |
| Incremental Ingestion | ✅ (change-tracking) |
| Scheduled Execution | ✅ |
How it works: The platform connects to your PostgreSQL database, runs the configured query (or full table extract), streams the rows as CSV, and imports them into your data warehouse.
Google Sheets / Drive (In progress)
Google Sheets/Drive is visible in the product as In progress and is temporarily unavailable for new connections, previews, tests, and scheduled jobs.
| Feature | Status |
|---|---|
| Authentication | OAuth 2.0 (Google) |
| Preview | In progress |
| Scheduled Execution | In progress |
| Incremental Ingestion | ❌ (full extract) |
Configuration fields:
- Spreadsheet ID — Found in the spreadsheet URL:
docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit - Range — e.g.
Sheet1!A:Zor justA:Zfor the default sheet
Planned behavior: The platform will use the Google Sheets API v4 to read spreadsheet data through OAuth. The first row is treated as headers, and subsequent rows are converted to CSV before loading into the warehouse.
Microsoft (Excel / SharePoint)
Connect your Microsoft 365 account via OAuth to access Excel Online files, SharePoint folders, and SharePoint lists.
| Feature | Status |
|---|---|
| Authentication | OAuth 2.0 (Microsoft) |
| Preview | ✅ |
| Scheduled Execution | ✅ |
| Incremental Ingestion | ❌ (full extract) |
Three source types are supported:
Excel File
Read a specific Excel workbook from OneDrive or SharePoint.
SharePoint Folder
Ingest all CSV and Excel files from a SharePoint document library folder.
SharePoint List
Read data from a SharePoint list and convert to tabular format.
MySQL
Fully supported with schema browsing, SQL queries, previews, and scheduled executions.
| Feature | Status |
|---|---|
| Authentication | Username/Password |
| Schema Browser | ✅ |
| Preview | ✅ |
| Incremental Ingestion | ✅ (via column watermark tracking) |
| Scheduled Execution | ✅ |
SQL Server
Fully supported for enterprise workloads, including SQL authentication.
| Feature | Status |
|---|---|
| Authentication | Username/Password |
| Schema Browser | ✅ |
| Preview | ✅ |
| Incremental Ingestion | ✅ (via column watermark tracking) |
| Scheduled Execution | ✅ |
Inbound API (Push)
External systems push data into the platform via a dedicated webhook endpoint. Each connection gets a unique, revocable ingest token.
| Feature | Status |
|---|---|
| Authentication | Ingest Token (auto-generated, 256-bit) |
| Direction | Push (external → platform) |
| Preview | ✅ (last received data) |
| Multi-Table Support | ✅ (hybrid single-request routing) |
| Auto-Healing Schema | ✅ (zero-cost pre-filtering & AI Schema Matcher) |
| Scheduled Execution | N/A — event-driven / queue worker |
How it works: Create an Inbound API connection, the platform generates a unique endpoint URL with an ingest token. External systems POST JSON data (single-table arrays or multi-table structures) to the endpoint.
AI API Client (Pull)
Connect to any REST API without writing code. By providing technical documentation (ReDoc, Swagger schema, JSON/TXT, or raw endpoints text) and describing in natural language what tables you want (e.g., "Pull won opportunities from ERP"), our AI Integrator orchestrates the entire bridge in an automated, secure background sandbox.
| Feature | Status |
|---|---|
| Authentication | Bearer Token / API Key / Basic Auth / Custom Headers |
| Direction | Pull (Iara Data → External API) |
| Frictionless Mapping | ✅ (Full analysis of manual documents & schema routes) |
| Active Domain Duplicate Detection | ✅ (Auto-reuses valid credentials for existing host domains) |
| Adaptive Auth Guidance | ✅ (Visual copy & paste steps based on API security checks) |
| Silent Background Compilation | ✅ (Generated logic and trial runs happen without tech-noise) |
| Execution & Scheduling | ✅ (Custom recurrence, schema review, type mapping & keys) |
Unified No-Code Experience
To achieve an effortless cognitive flow, we consolidated connections and pipelines into a single continuous wizard:
- Analyze Documentations: Insert your target API root and paste technical references (APIs manuals, ReDocs, or plain text descriptions). Our agent scans the payload security structures instantly.
- Helpful Credentials Guides: Instead of guessing headers, the wizard provides tailored copy/paste steps explaining where to retrieve the API key or token.
- Domain Duplicate Prevention: If you try to create a connection for an API host domain that was already configured, the platform flags the duplication block and offers a 1-Click Reuse action to borrow existing valid credentials safely.
- Click Recommended Pipelines: The model maps the endpoints and suggests a deck of typical data-context cards (e.g. Customers/Clientes, Invoices, Logs). Clicking a card triggers the bridge creation.
- Silent Test Run: The system compiles the connector, runs a trial query, and returns a verified data preview within seconds—completely shielding the business user from complex scripts or code approvals.
- Review and Schedule: Check the final preview tables, configure standard transformations, set PII privacy masking, select keys, and set up your cron scheduler inside the review screen.
Shopify
Ingest data from your Shopify store via the Admin API.
| Feature | Status |
|---|---|
| Authentication | Admin API Access Token |
| Supported Objects | Orders, Products, Customers, Inventory Items, Collections |
| Preview | ✅ |
| Pagination | ✅ (cursor-based via Link header) |
| Incremental Ingestion | ✅ (via updated_at change tracking) |
| Scheduled Execution | ✅ |
| Plan | Starter+ |
Configuration fields:
- Shop Domain — e.g.
my-store.myshopify.com - Access Token — From a Shopify custom app (Admin API)
- API Version — Defaults to
2024-01
How it works: The platform calls the Shopify Admin REST API, paginates through all records using cursor-based pagination (Link header), flattens nested objects, and uploads the data as CSV to the data warehouse.
To get an access token:
- Go to Shopify Admin → Settings → Apps and sales channels → Develop apps
- Create a custom app and configure Admin API scopes (read_orders, read_products, etc.)
- Install the app and copy the Admin API access token
Stripe
Ingest payment and billing data from Stripe.
| Feature | Status |
|---|---|
| Authentication | Restricted API Key (read-only) |
| Supported Objects | Charges, Subscriptions, Customers, Invoices, Payouts, Disputes, Products, Prices |
| Preview | ✅ |
| Pagination | ✅ (cursor-based via starting_after) |
| Incremental Ingestion | ✅ (via created timestamp filter) |
| Scheduled Execution | ✅ |
| Plan | Starter+ |
Configuration fields:
- API Key — Restricted key with read-only permissions
How it works: The platform calls the Stripe REST API with cursor-based pagination, flattens nested objects (metadata, address, etc.), and uploads to the data warehouse.
To get an API key:
- Go to Stripe Dashboard → Developers → API keys
- Create a restricted key with read-only permissions for the data you need
HubSpot
Ingest CRM data from HubSpot.
| Feature | Status |
|---|---|
| Authentication | Private App Access Token |
| Supported Objects | Contacts, Companies, Deals, Tickets, Products, Line Items |
| Preview | ✅ |
| Pagination | ✅ (cursor-based search API) |
| Incremental Ingestion | ✅ (via updatedAt) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Access Token — From a HubSpot private app
How it works: The platform uses the HubSpot CRM v3 Search API to paginate through records. Properties are automatically flattened from the nested properties object.
To get an access token:
- Go to HubSpot → Settings → Integrations → Private Apps
- Create a private app with CRM object read scopes
- Copy the access token
TOTVS Protheus
Ingest data from the TOTVS Protheus ERP system via its REST API.
| Feature | Status |
|---|---|
| Authentication | Basic Auth / Bearer Token / API Key |
| Supported Entities | Customers (SA1), Products (SB1), Sales Orders (SC5), Invoices (SF2), Financials (SE1/SE2), custom |
| Preview | ✅ |
| Pagination | ✅ (offset-based) |
| Field Mapping | ✅ (map Protheus fields to standard names) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Base URL — e.g.
https://protheus.company.com:8888 - Auth Type — Basic, Bearer, or API Key
- Environment / Company / Branch — Protheus-specific context headers
How it works: The platform calls the TOTVS Protheus REST API using offset pagination. Pre-configured entity endpoints (SA1, SB1, SC5, SF2, SE1, SE2) map to standard business objects. Custom endpoints can be specified for non-standard entities.
S3 / GCS (Cloud Bucket)
Ingest files from Amazon S3 or Google Cloud Storage buckets.
| Feature | Status |
|---|---|
| Authentication | Access Key (S3) / Service Account (GCS) |
| Supported Formats | CSV, JSON, JSONL, XLSX (Excel), Parquet |
| Preview | ✅ |
| Multi-File Ingestion | ✅ |
| Binary Ingestion | ✅ (raw buffers download preventing file corruption) |
| Incremental Ingestion | ✅ (by file path history tracking & filename watermarks) |
| S3-Compatible | ✅ (MinIO, DigitalOcean Spaces, etc.) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Provider — S3 or GCS
- Bucket Name — Target bucket
- Region — S3 region (e.g.
us-east-1) - Custom Endpoint — For S3-compatible services like MinIO
How it works:
- Binary-safe Download: Files are downloaded as raw buffers, fully supporting binary formats like Excel (
.xlsx) and Apache Parquet (.parquet) without text-encoding corruption. - Incremental Ingestion (File Log): The platform maintains a history of ingested files (
boitata_ingested_files). When scheduled, it performs an outer join check to skip files that were already successfully processed, preventing duplicate imports. - Filename Date Watermarks: If configured, the platform extracts timestamps from filename patterns to dynamically advance the job's watermark, skipping older files.
- Ingestion: A
_source_filecolumn is added to track the origin of each row. Combined data is structured and loaded into Nessie/Iceberg tables.
Salesforce
Ingest CRM and business data from Salesforce using SOQL queries.
| Feature | Status |
|---|---|
| Authentication | OAuth Access Token |
| Custom SOQL Queries | ✅ |
| Supported Objects | All standard and custom Salesforce objects |
| Preview | ✅ |
| Pagination | ✅ (automatic via nextRecordsUrl) |
| Incremental Ingestion | ✅ (via LastModifiedDate) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Instance URL — e.g.
https://mycompany.salesforce.com - Access Token — OAuth access token
- API Version — Defaults to
v59.0
How it works: The platform executes SOQL queries against the Salesforce REST API. Results are automatically paginated using nextRecordsUrl. You can specify individual objects with field selection or write custom SOQL queries.
MongoDB
Ingest documents from MongoDB via the Atlas Data API.
| Feature | Status |
|---|---|
| Authentication | Atlas Data API Key |
| Custom Queries | ✅ (filter, projection, sort) |
| Preview | ✅ |
| Scheduled Execution | ✅ |
| Plan | Starter+ |
Configuration fields:
- Data API URL — Atlas Data API endpoint
- API Key — Data API key
- Data Source — Cluster name (e.g.
Cluster0) - Database — Target database
How it works: The platform uses the MongoDB Atlas Data API /action/find endpoint. Extended JSON types ($oid, $date, $numberDecimal) are automatically converted to primitive values. Arrays and nested objects are JSON-serialized into string columns.
FTP / SFTP
Connect to FTP or SFTP servers to ingest files.
| Feature | Status |
|---|---|
| Authentication | Username/Password or Private Key (SFTP) |
| Connection Test | ✅ (TCP reachability) |
| Full Execution | ⚠️ Requires ssh2-sftp-client dependency |
| Plan | Starter+ |
Note: Full FTP/SFTP execution requires the ssh2-sftp-client and basic-ftp npm packages to be installed. The connection test checks TCP reachability only.
BigQuery
Run SQL queries against Google BigQuery and ingest the results.
| Feature | Status |
|---|---|
| Authentication | Service Account JSON (JWT) |
| Custom SQL Queries | ✅ |
| Preview | ✅ |
| Schema Types | ✅ (preserves BigQuery type metadata) |
| Scheduled Execution | ✅ |
| Plan | Business+ |
Configuration fields:
- Project ID — GCP project ID
- Service Account JSON — Service account key with BigQuery Reader role
- Location — BigQuery dataset location (e.g.
US,EU)
How it works: The platform signs a JWT using the service account key, exchanges it for an access token, then executes the SQL query via the BigQuery REST API. Schema metadata is preserved (field types from BigQuery schema).
Snowflake
Run SQL queries against Snowflake and ingest the results.
| Feature | Status |
|---|---|
| Authentication | Username/Password or Key-Pair |
| Custom SQL Queries | ✅ |
| Preview | ✅ |
| Scheduled Execution | ✅ |
| Plan | Business+ |
Configuration fields:
- Account — Snowflake account identifier (e.g.
xy12345.us-east-1) - Username / Password — Snowflake credentials
- Warehouse — Compute warehouse
- Database / Schema / Role — Default context
How it works: The platform submits SQL queries via the Snowflake SQL REST API (/api/v2/statements). Results are returned synchronously for small queries. Column names and data types are extracted from resultSetMetaData.
SAP (OData)
Ingest data from SAP systems via OData services.
| Feature | Status |
|---|---|
| Authentication | Basic Auth / OAuth / API Key |
| OData Queries | ✅ ($select, $filter, $expand) |
| Preview | ✅ |
| Pagination | ✅ ($skip/$top with inline count) |
| Scheduled Execution | ✅ |
| Plan | Business+ |
Configuration fields:
- OData Service URL — SAP Gateway service URL
- Auth Type — Basic, OAuth, or API Key
- SAP Client — (optional) Client number (e.g.
100)
How it works: The platform calls the SAP OData service with $skip/$top pagination, using $inlinecount=allpages to determine total record count. Metadata objects (__metadata, __deferred) are automatically stripped from results.
Kafka
Consume messages from Kafka topics via the Confluent REST Proxy.
| Feature | Status |
|---|---|
| Authentication | SASL (PLAIN / SCRAM) |
| Protocol | Confluent REST Proxy (v2) |
| Preview | ✅ |
| Offset Control | ✅ (earliest / latest) |
| Scheduled Execution | ✅ |
| Plan | Business+ |
Configuration fields:
- REST Proxy URL — Confluent REST Proxy endpoint
- Schema Registry URL — (optional) For Avro/Protobuf deserialization
- SASL Credentials — (optional) Username/password
How it works: The platform creates a consumer instance via the REST Proxy, subscribes to the specified topic, consumes messages, and then cleans up the consumer. Message values are flattened from JSON; metadata (topic, partition, offset, key, timestamp) is preserved.
Note: For native Kafka connections (without REST Proxy), install kafkajs (coming soon).
Notion
Ingest data from Notion databases and pages.
| Feature | Status |
|---|---|
| Authentication | Integration Token |
| Supported Objects | Databases, Pages |
| Preview | ✅ |
| Pagination | ✅ (cursor-based) |
| Incremental Ingestion | ✅ (via last_edited_time) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Integration Token — From a Notion integration (
secret_...)
How it works: The platform queries Notion databases using the API's /databases/{id}/query endpoint. All 18+ Notion property types (title, rich_text, number, select, multi_select, date, checkbox, URL, email, phone, formula, relation, rollup, people, files, created_time, last_edited_time, status) are automatically flattened to scalar values.
To get an integration token:
- Go to notion.so/my-integrations
- Create a new integration
- Share the target database with the integration
Slack
Export messages, users, and channels from Slack workspaces.
| Feature | Status |
|---|---|
| Authentication | Bot User OAuth Token (xoxb-...) |
| Supported Data Types | Messages, Users, Channels |
| Preview | ✅ |
| Pagination | ✅ (cursor-based) |
| Incremental Ingestion | ✅ (via message ts timestamp) |
| Scheduled Execution | ✅ |
| Plan | Growth+ |
Configuration fields:
- Bot Token — Slack Bot User OAuth Token (
xoxb-...)
Data types:
- Messages — Channel message history with reactions, threads, attachments
- Users — Workspace members with profile info, email, status
- Channels — Public and private channels with topic, purpose, member count
To get a bot token:
- Go to api.slack.com/apps and create a new app
- Add Bot Token Scopes:
channels:history,channels:read,users:read,users:read.email - Install the app to your workspace and copy the Bot User OAuth Token
Connector Architecture
All connectors follow the same pipeline pattern:
External Source → Connector Engine → Data Processing → Iara Data Warehouse
- Connection stores encrypted credentials (OAuth tokens or passwords)
- Job defines what to extract (query, spreadsheet, folder) and the schedule
- Execution engine routes to the appropriate connector
- Data is always normalized before being imported into your data warehouse
- The platform handles schema inference, table creation, and data loading
Pricing Tiers
| Tier | Connectors Included |
|---|---|
| Starter | PostgreSQL, MySQL, SQL Server, Microsoft Excel, File Upload, Inbound API, AI API Client, Shopify, Stripe, MongoDB, FTP/SFTP |
| Growth | + HubSpot, TOTVS Protheus, S3/GCS, Salesforce, Notion, Slack |
| Business | + BigQuery, Snowflake, SAP, Kafka |