← Atlas/Sheet 07/Data Engineering · Geospatial

TransitLake

Dagster-orchestrated medallion lakehouse over Chicago transit

Role

Sole engineer

Timeline

2026

Year

2026

Category

Data Engineering · Geospatial

Interactive demo— simulated, runs in your browser

Survey notes

A Dagster-orchestrated medallion lakehouse over Chicago multi-modal transit data. It ingests daily GTFS plus live real-time vehicle positions (CTA bus via the BusTime JSON API, rail normalized to GTFS-Realtime protobuf), road congestion, and weather; conforms them across modes; models them into dbt dims/facts/marts on DuckDB; and enforces 100+ data-quality checks across bronze→silver→gold. A Streamlit dashboard and an in-browser Next.js + DuckDB-WASM app surface on-time performance, congestion hotspots, and delay↔weather/congestion analyses.

Notable terrain

  • 01Medallion lakehouse — bronze (raw, dated partitions) → silver (conformed) → gold (dbt marts), all in one Dagster asset graph
  • 02Multi-source ingestion: GTFS static, real-time bus (BusTime JSON) and rail (GTFS-Realtime protobuf), Socrata congestion, and Open-Meteo weather
  • 0336 dbt models (7 staging · 5 intermediate · 6 dim · 4 fact · 14 mart) on dbt-duckdb
  • 04100+ data-quality checks — Great Expectations suites, 90+ dbt tests, and 12 Dagster asset checks (some blocking)
  • 05Gold marts answer transit questions: on-time performance, congestion hotspots, delay↔weather/congestion
  • 06In-browser SQL over the gold marts via Next.js + DuckDB-WASM, deployed free on Vercel

Next sheet

relfair