ETL Pipelines
Pipelines automate recurring data imports from external sources into your data warehouse. Set up a connection once, configure a schedule, and your data stays fresh automatically.
Creating a Pipeline
- Navigate to Data → Pipelines tab
- Click + New Pipeline
- Follow the 3-step wizard

Step 1: Choose a Connection
Select an existing connection or create a new one:

Supported Connection Types
| Source | What You Need | Features |
|---|---|---|
| PostgreSQL | Host, port, database, username, password | Full SQL queries, schema browser |
| MySQL | Host, port, database, username, password | Full SQL queries, schema browser |
| SQL Server | Host, port, database, username, password | Full SQL queries, schema browser |
| Google Sheets | Spreadsheet ID, sheet name, range | Auto-refresh from shared sheets |
| Microsoft Excel | Drive ID, Item ID, sheet name | Excel Online via Graph API |
| SharePoint Folder | Site URL, folder path | Ingest all CSV/Excel files in a folder |
| SharePoint List | Site URL, list name | List data as tabular format |
Schema Browser (Databases)
For database connections, the platform provides a visual schema browser:

- Browse tables and columns visually
- Click columns to add them to your query
- Preview data before configuring the pipeline
AI Configuration Assistant
Not sure how to write the SQL query? Use the AI Assistant:
Describe what data you want in plain language, and the AI generates the SQL query for you.
Example: "Get all orders from the last 30 days with customer name, product, and total amount"
Step 2: Configure Schedule

| Frequency | Options |
|---|---|
| Hourly | Every N hours |
| Daily | Run at specific time(s) — supports multiple run times |
| Weekly | Choose day(s) of the week + time |
| Custom Cron | Enter a cron expression for full flexibility |
Timezone: All schedules run in your configured timezone.
Incremental Import
For large tables, enable incremental import to only fetch new/updated rows:
- Select a change tracking column (e.g.,
updated_at,id) - Set the initial value (e.g.,
2024-01-01or0) - Each run fetches only rows where the tracking column > last synced value
This dramatically reduces load times and database impact.
Step 3: Review Columns

Same column configuration as file upload:
- Rename columns, change types, mark PII, set keys
- Apply transformations
- Choose write mode (Append / Replace / Merge)
Managing Pipelines

Pipeline Actions
| Action | Description |
|---|---|
| Run Now | Trigger an immediate execution |
| Pause | Temporarily stop scheduled runs |
| Resume | Re-enable scheduled runs |
| Stop | Cancel a currently running execution |
| Upload New Version | Upload a file to replace the pipeline's data |
| Delete | Remove the pipeline permanently |
Execution History
Each pipeline shows its execution history with:
- Status: Success ✅, Warning ⚠️, Error ❌
- Start time and duration
- Rows ingested
- Error details (if any)
Tips
Pipeline Best Practices
- Use incremental import for large tables to minimize load times
- Set meaningful names for connections and pipelines
- Test your SQL query with the Schema Browser before scheduling
- Monitor the Quality tab for ingestion failures
- Use merge mode with key columns for upsert behavior