Rename package from baflakehouse to bankfraud
All checks were successful
Deploy Lakehouse Docs / build-and-deploy (push) Successful in 8m44s
Lint & Format Check / Link Check (push) Successful in 3s
Lint & Format Check / Format Check (styler) (push) Successful in 14s
R Package Tests / test (push) Successful in 53s

- DESCRIPTION: Package name and URL updated to /bank-fraud
- R/baflakehouse-package.R → R/bankfraud-package.R
- _pkgdown.yml: url and reference alias updated
- deploy.yaml: TARGET_DIR updated to /var/www/docs/bank-fraud/
- deploy/baflakehouse.caddy: deleted (stale, superseded by rsync workflow)
- tests and README updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:38:54 -05:00
parent fdd75f80da
commit 85bc257e7b
8 changed files with 20 additions and 30 deletions

View File

@@ -2,18 +2,18 @@
output: github_document
---
- [baflakehouse](#baflakehouse)
- [bankfraud](#bankfraud)
- [About](#about)
- [Results](#results)
- [Clone](#clone)
- [Acknowledgements](#acknowledgements)
- [Citation](#citation)
# baflakehouse
# bankfraud
## About
The baflakehouse package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package.
The bankfraud package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package.
Significance
Financial fraud datasets suffer from extreme class imbalance, making traditional accuracy metrics highly misleading. This pipeline is engineered specifically to handle that imbalance without aggressive synthetic oversampling.