Skip to main content

ETL Pipelines

Pipelines automate recurring data imports from external sources into your data warehouse. Set up a connection once, configure a schedule, and your data stays fresh automatically.

Creating a Pipeline

  1. Navigate to DataPipelines tab
  2. Click + New Pipeline
  3. Follow the 3-step wizard

Screenshot showing the pipeline creation wizard with the 3 steps indicated in a progress bar

Step 1: Choose a Connection

Select an existing connection or create a new one:

Screenshot showing the connection selector with database type icons

Supported Connection Types

SourceWhat You NeedFeatures
PostgreSQLHost, port, database, username, passwordFull SQL queries, schema browser
MySQLHost, port, database, username, passwordFull SQL queries, schema browser
SQL ServerHost, port, database, username, passwordFull SQL queries, schema browser
Google SheetsSpreadsheet ID, sheet name, rangeAuto-refresh from shared sheets
Microsoft ExcelDrive ID, Item ID, sheet nameExcel Online via Graph API
SharePoint FolderSite URL, folder pathIngest all CSV/Excel files in a folder
SharePoint ListSite URL, list nameList data as tabular format

Schema Browser (Databases)

For database connections, the platform provides a visual schema browser:

Screenshot showing the schema browser with expandable tables and column lists

  • Browse tables and columns visually
  • Click columns to add them to your query
  • Preview data before configuring the pipeline

AI Configuration Assistant

Not sure how to write the SQL query? Use the AI Assistant:

Describe what data you want in plain language, and the AI generates the SQL query for you.

Example: "Get all orders from the last 30 days with customer name, product, and total amount"

Step 2: Configure Schedule

Screenshot showing the schedule configuration with frequency selector and time picker

FrequencyOptions
HourlyEvery N hours
DailyRun at specific time(s) — supports multiple run times
WeeklyChoose day(s) of the week + time
Custom CronEnter a cron expression for full flexibility

Timezone: All schedules run in your configured timezone.

Incremental Import

For large tables, enable incremental import to only fetch new/updated rows:

  1. Select a change tracking column (e.g., updated_at, id)
  2. Set the initial value (e.g., 2024-01-01 or 0)
  3. Each run fetches only rows where the tracking column > last synced value

This dramatically reduces load times and database impact.

Step 3: Review Columns

Screenshot showing the column review screen similar to file upload review

Same column configuration as file upload:

  • Rename columns, change types, mark PII, set keys
  • Apply transformations
  • Choose write mode (Append / Replace / Merge)

Managing Pipelines

Screenshot of the pipeline list showing status indicators and action buttons

Pipeline Actions

ActionDescription
Run NowTrigger an immediate execution
PauseTemporarily stop scheduled runs
ResumeRe-enable scheduled runs
StopCancel a currently running execution
Upload New VersionUpload a file to replace the pipeline's data
DeleteRemove the pipeline permanently

Execution History

Each pipeline shows its execution history with:

  • Status: Success ✅, Warning ⚠️, Error ❌
  • Start time and duration
  • Rows ingested
  • Error details (if any)

Tips

info

Pipeline Best Practices

  1. Use incremental import for large tables to minimize load times
  2. Set meaningful names for connections and pipelines
  3. Test your SQL query with the Schema Browser before scheduling
  4. Monitor the Quality tab for ingestion failures
  5. Use merge mode with key columns for upsert behavior