Racial & gender disparity investigation in U.S. mortgage lending · HMDA 2023
A data mining investigation into racial and gender disparities in U.S. mortgage lending. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset, I built a PySpark pipeline that loads a 4 GB national file, filters to Chicago, and implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from research by Asudeh et al. The project uncovers that routine preprocessing steps—like binning income—can silently amplify existing racial bias by up to 9.63 percentage points.