Rename package from baflakehouse to bankfraud
All checks were successful
All checks were successful
- DESCRIPTION: Package name and URL updated to /bank-fraud - R/baflakehouse-package.R → R/bankfraud-package.R - _pkgdown.yml: url and reference alias updated - deploy.yaml: TARGET_DIR updated to /var/www/docs/bank-fraud/ - deploy/baflakehouse.caddy: deleted (stale, superseded by rsync workflow) - tests and README updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -57,7 +57,7 @@ jobs:
|
|||||||
SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
|
SSH_PRIVATE_KEY: ${{ secrets.DEPLOY_SSH_KEY }}
|
||||||
SERVER_IP: ${{ secrets.DEPLOY_SERVER_IP }}
|
SERVER_IP: ${{ secrets.DEPLOY_SERVER_IP }}
|
||||||
SERVER_USER: ${{ secrets.DEPLOY_SERVER_USER }}
|
SERVER_USER: ${{ secrets.DEPLOY_SERVER_USER }}
|
||||||
TARGET_DIR: /var/www/docs/baflakehouse/
|
TARGET_DIR: /var/www/docs/bank-fraud/
|
||||||
run: |
|
run: |
|
||||||
# Setup SSH key
|
# Setup SSH key
|
||||||
mkdir -p ~/.ssh
|
mkdir -p ~/.ssh
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
Package: baflakehouse
|
Package: bankfraud
|
||||||
Title: Lakehouse Workflow for the Bank Account Fraud Dataset
|
Title: Lakehouse Workflow for the Bank Account Fraud Dataset
|
||||||
Version: 0.0.0.9000
|
Version: 0.0.0.9000
|
||||||
Authors@R:
|
Authors@R:
|
||||||
@@ -52,5 +52,5 @@ Suggests:
|
|||||||
testthat (>= 3.0.0),
|
testthat (>= 3.0.0),
|
||||||
withr
|
withr
|
||||||
Config/testthat/edition: 3
|
Config/testthat/edition: 3
|
||||||
URL: https://docs.robwiederstein.org/baflakehouse
|
URL: https://docs.robwiederstein.org/bank-fraud
|
||||||
BugReports: https://git.robwiederstein.org/rkw/bank-fraud-baf-lakehouse/issues
|
BugReports: https://git.robwiederstein.org/rkw/bank-fraud-baf-lakehouse/issues
|
||||||
|
|||||||
13
Dockerfile
13
Dockerfile
@@ -1,4 +1,4 @@
|
|||||||
FROM rocker/verse:4.4
|
FROM rocker/verse:4.5.2
|
||||||
|
|
||||||
# System dependencies for arrow, lightgbm, and ggplot2 (ragg/textshaping)
|
# System dependencies for arrow, lightgbm, and ggplot2 (ragg/textshaping)
|
||||||
# Quarto is pre-installed in rocker/verse
|
# Quarto is pre-installed in rocker/verse
|
||||||
@@ -25,13 +25,16 @@ WORKDIR /app
|
|||||||
COPY renv.lock .Rprofile ./
|
COPY renv.lock .Rprofile ./
|
||||||
COPY renv/activate.R renv/settings.json renv/
|
COPY renv/activate.R renv/settings.json renv/
|
||||||
|
|
||||||
RUN Rscript -e "renv::restore()"
|
RUN Rscript -e "renv::restore(prompt = FALSE)"
|
||||||
|
|
||||||
# Copy the full package source
|
# Copy the full package source
|
||||||
COPY . .
|
COPY . .
|
||||||
|
|
||||||
# Install the local package into the renv library
|
# Install the local package into the renv library, then re-run restore so
|
||||||
RUN Rscript -e "renv::install('.')"
|
# any package that renv skipped by finding it in the rocker system library
|
||||||
|
# (e.g. styler) ends up in the project library where renv can actually see it.
|
||||||
|
RUN Rscript -e "renv::install('.')" && \
|
||||||
|
Rscript -e "renv::restore(prompt = FALSE)"
|
||||||
|
|
||||||
# Non-secret default — override with --env at runtime if needed
|
# Non-secret default — override with --env at runtime if needed
|
||||||
ENV BAF_BUCKET=lake
|
ENV BAF_BUCKET=lake
|
||||||
@@ -41,5 +44,5 @@ ENV BAF_BUCKET=lake
|
|||||||
# --env BAF_ENDPOINT=172.19.0.1:9100 \
|
# --env BAF_ENDPOINT=172.19.0.1:9100 \
|
||||||
# --env BAF_KEY=... \
|
# --env BAF_KEY=... \
|
||||||
# --env BAF_SECRET=... \
|
# --env BAF_SECRET=... \
|
||||||
# baflakehouse
|
# bankfraud
|
||||||
CMD ["Rscript", "deploy.R"]
|
CMD ["Rscript", "deploy.R"]
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
#' baflakehouse: Lakehouse Workflow for the Bank Account Fraud Dataset
|
#' bankfraud: Lakehouse Workflow for the Bank Account Fraud Dataset
|
||||||
#'
|
#'
|
||||||
#' Tools to ingest the Bank Account Fraud (BAF) Base dataset into a MinIO/S3-backed
|
#' Tools to ingest the Bank Account Fraud (BAF) Base dataset into a MinIO/S3-backed
|
||||||
#' lakehouse, clean encoded missing values, and produce reproducible reporting
|
#' lakehouse, clean encoded missing values, and produce reproducible reporting
|
||||||
#' artifacts orchestrated with targets.
|
#' artifacts orchestrated with targets.
|
||||||
#'
|
#'
|
||||||
#' @docType _PACKAGE
|
#' @docType _PACKAGE
|
||||||
#' @name baflakehouse-package
|
#' @name bankfraud-package
|
||||||
NULL
|
NULL
|
||||||
@@ -2,18 +2,18 @@
|
|||||||
output: github_document
|
output: github_document
|
||||||
---
|
---
|
||||||
|
|
||||||
- [baflakehouse](#baflakehouse)
|
- [bankfraud](#bankfraud)
|
||||||
- [About](#about)
|
- [About](#about)
|
||||||
- [Results](#results)
|
- [Results](#results)
|
||||||
- [Clone](#clone)
|
- [Clone](#clone)
|
||||||
- [Acknowledgements](#acknowledgements)
|
- [Acknowledgements](#acknowledgements)
|
||||||
- [Citation](#citation)
|
- [Citation](#citation)
|
||||||
|
|
||||||
# baflakehouse
|
# bankfraud
|
||||||
|
|
||||||
## About
|
## About
|
||||||
|
|
||||||
The baflakehouse package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package.
|
The bankfraud package is an end-to-end machine learning pipeline built to detect credit card fraud. Rather than relying on static local files, it implements a modern Lakehouse architecture. It ingests a massive 1-million-row dataset, partitions it into Parquet files via Apache Arrow, stores it on a MinIO object server, and trains a production-ready LightGBM model orchestrated entirely by the targets package.
|
||||||
Significance
|
Significance
|
||||||
|
|
||||||
Financial fraud datasets suffer from extreme class imbalance, making traditional accuracy metrics highly misleading. This pipeline is engineered specifically to handle that imbalance without aggressive synthetic oversampling.
|
Financial fraud datasets suffer from extreme class imbalance, making traditional accuracy metrics highly misleading. This pipeline is engineered specifically to handle that imbalance without aggressive synthetic oversampling.
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
url: https://docs.robwiederstein.org/baflakehouse
|
url: https://docs.robwiederstein.org/bank-fraud
|
||||||
|
|
||||||
template:
|
template:
|
||||||
bootstrap: 5
|
bootstrap: 5
|
||||||
@@ -17,7 +17,7 @@ reference:
|
|||||||
- title: "Data Ingestion & Lakehouse Setup"
|
- title: "Data Ingestion & Lakehouse Setup"
|
||||||
desc: "Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet."
|
desc: "Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet."
|
||||||
contents:
|
contents:
|
||||||
- baflakehouse-package
|
- bankfraud-package
|
||||||
- convert_to_parquet
|
- convert_to_parquet
|
||||||
- connect_baf
|
- connect_baf
|
||||||
- clean_baf_base
|
- clean_baf_base
|
||||||
|
|||||||
@@ -1,13 +0,0 @@
|
|||||||
# BAF Lakehouse pkgdown site
|
|
||||||
# Served at: https://docs.robwiederstein.org/baflakehouse
|
|
||||||
#
|
|
||||||
# handle_path strips the /baflakehouse prefix before handing off to the
|
|
||||||
# file server, so requests map correctly to the flat docs/ directory.
|
|
||||||
#
|
|
||||||
# NOTE: The path below must match the mount point inside the Caddy Docker
|
|
||||||
# container (i.e., wherever /data/projects/ is mounted in docker-compose.yml).
|
|
||||||
|
|
||||||
handle_path /baflakehouse* {
|
|
||||||
root * /data/projects/bank-fraud-baf-lakehouse/docs
|
|
||||||
file_server
|
|
||||||
}
|
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
library(testthat)
|
library(testthat)
|
||||||
library(baflakehouse)
|
library(bankfraud)
|
||||||
|
|
||||||
test_check("baflakehouse")
|
test_check("bankfraud")
|
||||||
|
|||||||
Reference in New Issue
Block a user