ETL Pipelines

Pipelines automate recurring data imports from external sources into your data warehouse. Set up a connection once, configure a schedule, and your data stays fresh automatically.

Creating a Pipeline

Navigate to Data → Pipelines tab
Click + New Pipeline
Follow the 3-step wizard

Screenshot showing the pipeline creation wizard with the 3 steps indicated in a progress bar

Step 1: Choose a Connection

Select an existing connection or create a new one:

Screenshot showing the connection selector with database type icons

Supported Connection Types

Source	What You Need	Features
PostgreSQL	Host, port, database, username, password	Full SQL queries, schema browser
MySQL	Host, port, database, username, password	Full SQL queries, schema browser
SQL Server	Host, port, database, username, password	Full SQL queries, schema browser
Google Sheets / Drive	Not available for new setup	In progress
Microsoft Excel	Drive ID, Item ID, sheet name	Excel Online via Graph API
SharePoint Folder	Site URL, folder path	Ingest all CSV/Excel files in a folder
SharePoint List	Site URL, list name	List data as tabular format

Schema Browser (Databases)

For database connections, the platform provides a visual schema browser:

Screenshot showing the schema browser with expandable tables and column lists

Browse tables and columns visually
Click columns to add them to your query
Preview data before configuring the pipeline

AI Configuration Assistant

Not sure how to write the SQL query? Use the AI Assistant:

Describe what data you want in plain language, and the AI generates the SQL query for you.

Example: "Get all orders from the last 30 days with customer name, product, and total amount"

Step 2: Configure Schedule

Screenshot showing the schedule configuration with frequency selector and time picker

Frequency	Options
Hourly	Every N hours
Daily	Run at specific time(s) — supports multiple run times
Weekly	Choose day(s) of the week + time
Custom Cron	Enter a cron expression for full flexibility

Timezone: All schedules run in your configured timezone.

Incremental Import

For large tables, enable incremental import to only fetch new/updated rows:

Select a change tracking column (e.g., updated_at, id)
Set the initial value (e.g., 2024-01-01 or 0)
Each run fetches only rows where the tracking column > last synced value

This dramatically reduces load times and database impact.

Step 3: Review Columns

Screenshot showing the column review screen similar to file upload review

Same column configuration as file upload:

Rename columns, change types, mark PII, set keys
Apply transformations
Choose write mode (Append / Replace / Merge)

Managing Pipelines

Screenshot of the pipeline list showing status indicators and action buttons

Pipeline Actions

Action	Description
Run Now	Trigger an immediate execution
Pause	Temporarily stop scheduled runs
Resume	Re-enable scheduled runs
Stop	Cancel a currently running execution
Upload New Version	Upload a file to replace the pipeline's data
Delete	Remove the pipeline permanently

Execution History

Each pipeline shows its execution history with:

Status: Success ✅, Warning ⚠️, Error ❌
Start time and duration
Rows ingested
Error details (if any)

Tips

info

Pipeline Best Practices

Use incremental import for large tables to minimize load times
Set meaningful names for connections and pipelines
Test your SQL query with the Schema Browser before scheduling
Monitor the Quality tab for ingestion failures
Use merge mode with key columns for upsert behavior

Creating a Pipeline​

Step 1: Choose a Connection​

Supported Connection Types​

Schema Browser (Databases)​

AI Configuration Assistant​

Step 2: Configure Schedule​

Incremental Import​

Step 3: Review Columns​

Managing Pipelines​

Pipeline Actions​

Execution History​

Tips​