Initial commit: BAF Lakehouse fraud detection pipeline

End-to-end LightGBM fraud detection pipeline built as an R package,
orchestrated by targets with data stored in MinIO via Apache Arrow.
Includes 6-layer Lakehouse architecture, class imbalance tournament,
formally tuned hyperparameters (PR-AUC 0.198), and Quarto RevealJS slides.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-21 21:19:09 -05:00
commit 33d0fc31c7
56 changed files with 15596 additions and 0 deletions

34
DESCRIPTION Normal file
View File

@@ -0,0 +1,34 @@
Package: baflakehouse
Title: Lakehouse Workflow for the Bank Account Fraud Dataset
Version: 0.0.0.9000
Authors@R:
person("Rob", "Wiederstein", role = c("aut", "cre"),
email = "REPLACE_ME@example.com")
Description: Tools to ingest the Bank Account Fraud (BAF) Base dataset into a
MinIO/S3-backed lakehouse, clean encoded missing values, and produce
reproducible reporting artifacts (tables, figures, slides) orchestrated with
targets.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3
Imports:
arrow,
colorspace,
cowplot,
dplyr,
tidyr,
stringr,
readr,
gt,
quarto,
ggplot2,
bonsai
Suggests:
duckdb,
targets,
tarchetypes,
knitr,
scales
URL: https://docs.robwiederstein.org/baflakehouse
BugReports: https://git.robwiederstein.org/rkw/bank-fraud-baf-lakehouse/issues