Acknowledgements Adasyn ADASYN anonymized baf BAF colorspace conf CTGAN datasheet DuckDB EDA env FN FP FPR frac ggplot Gu Guo Hexbin Kaggle lakehouse Lakehouse lgbm LightGBM LightGBM's MinIO NeurIPS optimise Optimises pos pre qmd rds relabelled Renviron revealjs RevealJS Scalability serialised Shang Sig tabset tbl tibble Tibble tidymodels Tomek TP Undersampling XGBoost