08 / 31 · Research · Data Mining

FairLend Miners

Racial & gender disparity investigation in U.S. mortgage lending · HMDA 2023

interactive demo
loading demo…
about

A data mining investigation into racial and gender disparities in U.S. mortgage lending. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset, I built a PySpark pipeline that loads a 4 GB national file, filters to Chicago, and implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from research by Asudeh et al. The project uncovers that routine preprocessing steps—like binning income—can silently amplify existing racial bias by up to 9.63 percentage points.

what it does
  • 01PySpark pipeline processing 4 GB HMDA 2023 national dataset, filtered to 103,481 Chicago applications
  • 02Implemented epsilon-biased fair binning (Asudeh et al.) alongside standard equal-frequency binning
  • 03Measured maximum demographic deviation across all bins: standard 9.63% vs fair binning 8.00%
  • 04FP-Growth association rule mining: high DTI ratio → denial at 67.2% confidence, 2.81 lift
  • 05K-Means clustering audit flagging 10 cases of disparate impact under the four-fifths legal rule
  • 06Exported 3 Parquet files (300K+ records) enabling downstream team analyses