Skip to content
All Projects
Data EngineeringDelivered

Automated Distributor ETL for Unilever

Hands-off SFTP-to-analytics for daily sales & stock data

A fully automated ELT pipeline that detects distributor files landing on SFTP, validates and homologates them, and lands analytics-ready data in S3 for real-time querying via StarRocks.

DagsterdbtPostgreSQLAWS S3StarRocksSFTPPython
Problem Statement

Distributors of corporate clients like Unilever upload sales and stock data daily to SFTP. The data had to be detected, validated, homologated and made query-ready in real time with zero manual intervention and instant reflection of corrections.

  • Files arrive continuously on SFTP and must be auto-detected and processed.
  • Inconsistent product codes and units of measure across distributors.
  • Customers need a complete, up-to-date view of sales and stock in real time.
Headline Outcomes
Zero

Manual intervention

Real-time

Data freshness

Homologated

Cross-distributor consistency

The Solution

A Dagster-orchestrated ELT flow: extract on file arrival, validate format and fields, load RAW to PostgreSQL, transform and homologate against history in dbt, stage, then publish to S3 for StarRocks analytics.

Dagster detects SFTP arrivals and triggers extraction automatically.

Two-tier validation: initial format/field checks, then consistency checks against historical data.

dbt transforms and homologates product codes and units of measure for cross-distributor comparability.

Approved data lands in S3 and is queryable in real time through StarRocks.

System Architecture

How the data flows

01

SFTP Arrival

CSV from distributors

02

Validate

Format + field checks

03

RAW Load

PostgreSQL

04

Transform

dbt homologation

05

S3 + StarRocks

Real-time analytics

Result 01

Eliminated manual handling of daily distributor uploads end-to-end.

Result 02

Unified product codes and units so all distributor data is comparable.

Result 03

Gave clients an always-current view of sales and stock operations.

Further reading

From the blog

Data Engineering

Dagster vs Airflow vs Prefect for ETL in 2026

An honest, production-tested comparison of Dagster vs Airflow vs Prefect for ETL in 2026 — and how to pick the best ETL orchestration tool for your stack.

DagsterAirflowPrefectETL
Available for new work

Have a backend, AI, or data problem worth solving?

From production APIs to self-hosted AI that kills per-call costs let's scope it. I reply within one business day.