Rename package from baflakehouse to bankfraud
All checks were successful
Deploy Lakehouse Docs / build-and-deploy (push) Successful in 8m44s
Lint & Format Check / Link Check (push) Successful in 3s
Lint & Format Check / Format Check (styler) (push) Successful in 14s
R Package Tests / test (push) Successful in 53s

- DESCRIPTION: Package name and URL updated to /bank-fraud
- R/baflakehouse-package.R → R/bankfraud-package.R
- _pkgdown.yml: url and reference alias updated
- deploy.yaml: TARGET_DIR updated to /var/www/docs/bank-fraud/
- deploy/baflakehouse.caddy: deleted (stale, superseded by rsync workflow)
- tests and README updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 09:38:54 -05:00
parent fdd75f80da
commit 85bc257e7b
8 changed files with 20 additions and 30 deletions

View File

@@ -57,7 +57,7 @@ jobs:
SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }} SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
SERVER_IP: ${{ secrets.DEPLOY_SERVER_IP }} SERVER_IP: ${{ secrets.DEPLOY_SERVER_IP }}
SERVER_USER: ${{ secrets.DEPLOY_SERVER_USER }} SERVER_USER: ${{ secrets.DEPLOY_SERVER_USER }}
TARGET_DIR: /var/www/docs/baflakehouse/ TARGET_DIR: /var/www/docs/bank-fraud/
run: | run: |
# Setup SSH key # Setup SSH key
mkdir -p ~/.ssh mkdir -p ~/.ssh

View File

@@ -1,4 +1,4 @@
Package: baflakehouse Package: bankfraud
Title: Lakehouse Workflow for the Bank Account Fraud Dataset Title: Lakehouse Workflow for the Bank Account Fraud Dataset
Version: 0.0.0.9000 Version: 0.0.0.9000
Authors@R: Authors@R:
@@ -52,5 +52,5 @@ Suggests:
testthat (>= 3.0.0), testthat (>= 3.0.0),
withr withr
Config/testthat/edition: 3 Config/testthat/edition: 3
URL: https://docs.robwiederstein.org/baflakehouse URL: https://docs.robwiederstein.org/bank-fraud
BugReports: https://git.robwiederstein.org/rkw/bank-fraud-baf-lakehouse/issues BugReports: https://git.robwiederstein.org/rkw/bank-fraud-baf-lakehouse/issues

View File

@@ -1,4 +1,4 @@
FROM rocker/verse:4.4 FROM rocker/verse:4.5.2
# System dependencies for arrow, lightgbm, and ggplot2 (ragg/textshaping) # System dependencies for arrow, lightgbm, and ggplot2 (ragg/textshaping)
# Quarto is pre-installed in rocker/verse # Quarto is pre-installed in rocker/verse
@@ -25,13 +25,16 @@ WORKDIR /app
COPY renv.lock .Rprofile ./ COPY renv.lock .Rprofile ./
COPY renv/activate.R renv/settings.json renv/ COPY renv/activate.R renv/settings.json renv/
RUN Rscript -e "renv::restore()" RUN Rscript -e "renv::restore(prompt = FALSE)"
# Copy the full package source # Copy the full package source
COPY . . COPY . .
# Install the local package into the renv library # Install the local package into the renv library, then re-run restore so
RUN Rscript -e "renv::install('.')" # any package that renv skipped by finding it in the rocker system library
# (e.g. styler) ends up in the project library where renv can actually see it.
RUN Rscript -e "renv::install('.')" && \
Rscript -e "renv::restore(prompt = FALSE)"
# Non-secret default — override with --env at runtime if needed # Non-secret default — override with --env at runtime if needed
ENV BAF_BUCKET=lake ENV BAF_BUCKET=lake
@@ -41,5 +44,5 @@ ENV BAF_BUCKET=lake
# --env BAF_ENDPOINT=172.19.0.1:9100 \ # --env BAF_ENDPOINT=172.19.0.1:9100 \
# --env BAF_KEY=... \ # --env BAF_KEY=... \
# --env BAF_SECRET=... \ # --env BAF_SECRET=... \
# baflakehouse # bankfraud
CMD ["Rscript", "deploy.R"] CMD ["Rscript", "deploy.R"]

View File

@@ -1,9 +1,9 @@
#' baflakehouse: Lakehouse Workflow for the Bank Account Fraud Dataset #' bankfraud: Lakehouse Workflow for the Bank Account Fraud Dataset
#' #'
#' Tools to ingest the Bank Account Fraud (BAF) Base dataset into a MinIO/S3-backed #' Tools to ingest the Bank Account Fraud (BAF) Base dataset into a MinIO/S3-backed
#' lakehouse, clean encoded missing values, and produce reproducible reporting #' lakehouse, clean encoded missing values, and produce reproducible reporting
#' artifacts orchestrated with targets. #' artifacts orchestrated with targets.
#' #'
#' @docType _PACKAGE #' @docType _PACKAGE
#' @name baflakehouse-package #' @name bankfraud-package
NULL NULL

View File

@@ -2,18 +2,18 @@
output: github_document output: github_document
--- ---
- [baflakehouse](#baflakehouse) - [bankfraud](#bankfraud)
- [About](#about) - [About](#about)
- [Results](#results) - [Results](#results)
- [Clone](#clone) - [Clone](#clone)
- [Acknowledgements](#acknowledgements) - [Acknowledgements](#acknowledgements)
- [Citation](#citation) - [Citation](#citation)
# baflakehouse # bankfraud
## About ## About
The baflakehouse package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package. The bankfraud package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package.
Significance Significance
Financial fraud datasets suffer from extreme class imbalance, making traditional accuracy metrics highly misleading. This pipeline is engineered specifically to handle that imbalance without aggressive synthetic oversampling. Financial fraud datasets suffer from extreme class imbalance, making traditional accuracy metrics highly misleading. This pipeline is engineered specifically to handle that imbalance without aggressive synthetic oversampling.

View File

@@ -1,4 +1,4 @@
url: https://docs.robwiederstein.org/baflakehouse url: https://docs.robwiederstein.org/bank-fraud
template: template:
bootstrap: 5 bootstrap: 5
@@ -17,7 +17,7 @@ reference:
- title: "Data Ingestion & Lakehouse Setup" - title: "Data Ingestion & Lakehouse Setup"
desc: "Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet." desc: "Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet."
contents: contents:
- baflakehouse-package - bankfraud-package
- convert_to_parquet - convert_to_parquet
- connect_baf - connect_baf
- clean_baf_base - clean_baf_base

View File

@@ -1,13 +0,0 @@
# BAF Lakehouse pkgdown site
# Served at: https://docs.robwiederstein.org/baflakehouse
#
# handle_path strips the /baflakehouse prefix before handing off to the
# file server, so requests map correctly to the flat docs/ directory.
#
# NOTE: The path below must match the mount point inside the Caddy Docker
# container (i.e., wherever /data/projects/ is mounted in docker-compose.yml).
handle_path /baflakehouse* {
root * /data/projects/bank-fraud-baf-lakehouse/docs
file_server
}

View File

@@ -1,4 +1,4 @@
library(testthat) library(testthat)
library(baflakehouse) library(bankfraud)
test_check("baflakehouse") test_check("bankfraud")