Nami Kim profile
Nami Kim
Data Engineer ๐Ÿงก AI ML

Retail Card Transaction Data Mart

A fully self-built batch data pipeline simulating real-world retail transaction analytics.

๐Ÿ‘‰ Source Code (GitHub)

This project showcases my ability to design, build, and operate a full data mart system using modern data engineering tools

๐Ÿ’ก Project Overview

When working with retail or fintech transaction data, businesses need robust data pipelines that can clean, transform, validate, and deliver data for reporting and analytics.

In this project, I built a complete OLAP-style data mart pipeline using open-source tools, fully containerized for production-like deployment.

๐Ÿ”ง Tech Stack

  • Orchestration: Apache Airflow (Dockerized)
  • Data Transformation: dbt (Data Build Tool)
  • Data Storage: PostgreSQL (OLAP-style data mart)
  • Dashboard & BI: Metabase
  • Containerization: Docker Compose
  • Cloud Readiness: S3-ready ingestion logic for future extensibility
  • Data Quality: dbt tests (accepted range, uniqueness, referential integrity)

๐Ÿ—๏ธ Architecture Summary

  • Source: UCI Online Retail Dataset (transaction log format)
  • ETL Flow:
    • Raw ingestion โ†’ PostgreSQL
    • Staging models โ†’ dbt transformations
    • Fact and dimension models โ†’ Star schema design
    • Monthly aggregations โ†’ fct_monthly_sales table
    • Data quality checks โ†’ dbt tests for production readiness
  • Orchestration with Airflow:
    • Modular DAGs: ingestion_dag, dbt_pipeline_dag, full_etl_dag
    • Easy to extend and schedule for recurring batch jobs

๐Ÿ“Š Data Mart Design

  • stg_card_transactions: staging layer with data cleansing
  • dim_customers, dim_products: dimension tables
  • fct_transactions: full transaction-level fact table
  • fct_monthly_sales: monthly aggregated fact table for BI

โœ… Data Quality Controls

Ensured production-grade integrity with dbt tests:

  • Not Null Checks
  • Accepted Range Tests (for amount & quantity)
  • Unique Keys on surrogate primary keys
  • Referential Integrity between fact & dimension models

dbt test result

๐Ÿ“Š Example Dashboards

Built fully automated dashboards with Metabase:

  • Revenue Trends (6-month & current month)
  • Average Order Value
  • Top Selling Products
  • Customer Spending Trends
  • Anomaly Detection (suspicious transactions)

Metabase dashboard

โš™๏ธ Pipeline Orchestration

  • Dockerized deployment using Docker Compose
  • Modular Airflow DAGs for ingestion and transformation
  • Fault-tolerant design for batch processing pipelines

Airflow DAG 1 Airflow DAG 2

๐Ÿ”ฌ dbt Documentation & Lineage

  • dbt docs generated with full model documentation
  • Column-level metadata and lineage graphs
dbt doc dbt doc

dbt lineage

๐Ÿงฐ Key Skills Demonstrated

  • Full-stack batch data pipeline architecture
  • Data mart design using dbt
  • Docker-based orchestration of Airflow, dbt, Metabase, PostgreSQL
  • Data quality monitoring using dbt tests
  • Automated BI dashboards (Metabase)
  • Production-grade engineering mindset: modular, scalable, fault-tolerant

๐ŸŽฏ Takeaway

This project simulates real-world batch processing pipelines youโ€™d expect in production data platforms. It demonstrates:

  • My ability to own the full pipeline from ingestion to reporting
  • My understanding of data validation and observability
  • My hands-on experience with modern data stack tools: Airflow, dbt, Docker, Metabase