FairLend Miners
Racial & gender disparity investigation in U.S. mortgage lending · HMDA 2023
Role
Lead data engineer
Timeline
Jan 2026 — May 2026
Year
2026
Category
Research · Data Mining
Interactive demo— simulated, runs in your browser
Survey notes
A data mining investigation into racial and gender disparities in U.S. mortgage lending. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset, I built a PySpark pipeline that loads a 4 GB national file, filters to Chicago, and implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from research by Asudeh et al. The project uncovers that routine preprocessing steps—like binning income—can silently amplify existing racial bias by up to 9.63 percentage points.
Notable terrain
- 01PySpark pipeline processing 4 GB HMDA 2023 national dataset, filtered to 103,481 Chicago applications
- 02Implemented epsilon-biased fair binning (Asudeh et al.) alongside standard equal-frequency binning
- 03Measured maximum demographic deviation across all bins: standard 9.63% vs fair binning 8.00%
- 04FP-Growth association rule mining: high DTI ratio → denial at 67.2% confidence, 2.81 lift
- 05K-Means clustering audit flagging 10 cases of disparate impact under the four-fifths legal rule
- 06Exported 3 Parquet files (300K+ records) enabling downstream team analyses
Next sheet
GreenPipe