bank-fraud-baf-lakehouse

Author	SHA1	Message	Date
Rob Wiederstein	1d0202e3aa	Install Node.js in container before checkout Some checks failed Deploy Lakehouse Docs / build-and-deploy (push) Failing after 7s Details Lint & Format Check / Link Check (push) Successful in 12s Details Lint & Format Check / Format Check (styler) (push) Failing after 16s Details R Package Tests / test (push) Successful in 1m32s Details	2026-02-22 19:43:43 -05:00
Rob Wiederstein	e781eb3703	Downgrade checkout action to v3 for container compatibility Some checks failed Deploy Lakehouse Docs / build-and-deploy (push) Failing after 9s Details Lint & Format Check / Link Check (push) Successful in 3s Details Lint & Format Check / Format Check (styler) (push) Failing after 1s Details R Package Tests / test (push) Failing after 1s Details	2026-02-22 16:57:21 -05:00
Rob Wiederstein	e6c20bd221	Add Gitea CI deployment workflow and update dependencies Some checks failed Deploy Lakehouse Docs / build-and-deploy (push) Failing after 34s Details Lint & Format Check / Link Check (push) Successful in 17s Details Lint & Format Check / Format Check (styler) (push) Failing after 3s Details R Package Tests / test (push) Failing after 1s Details	2026-02-22 16:18:15 -05:00
Rob Wiederstein	df978d042f	Refactor bucket structure: baf-fraud/ prefix under lake bucket All functions now default to bucket_name = "lake" with "baf-fraud/" prepended to all layer prefixes, matching the contemporary lakehouse naming convention (one bucket per environment, project as prefix). Migration: copy baf-fraud/ data to lake/baf-fraud/ on analyticsvm, update BAF_BUCKET env var from "baf-fraud" to "lake". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 05:36:25 -05:00
Rob Wiederstein	dac01da6cb	Update renv.lock with spelling, styler, and test dependencies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 05:15:11 -05:00
Rob Wiederstein	5218deab74	Add Phase 5 Caddy deployment config and sync script - deploy/baflakehouse.caddy: handle_path snippet routes /baflakehouse* to docs/ with prefix stripping so pkgdown flat structure maps correctly - bin/sync-caddy.sh: one-time script to install snippet and zero-downtime reload Caddy; deploy.R handles everything after that automatically Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 04:56:10 -05:00
Rob Wiederstein	7a1a8e0053	Add Phase 4: code quality, CI/CD, and formatting - testthat infrastructure with 15 tests covering env-var guards, return types for all format/save functions, and spelling - inst/WORDLIST with 52 domain terms (LightGBM, MinIO, Parquet, etc.) - Spelling test wired into devtools::test() via test-spelling.R - styler::style_file() added as step 0 in deploy.R (auto-fixes before ship) - .gitea/workflows/test.yaml: runs testthat suite on push - .gitea/workflows/lint.yaml: lychee link check + styler dry-run on push - Removed internal IP address from comment in train_production_model() - Language: en-US added to DESCRIPTION Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 04:41:37 -05:00
Rob Wiederstein	705b2a13d0	Re-track resources/ as static presentation assets resources/images/confusion-matrix.png is a static Wikipedia screenshot used in index.qmd slides -- not a generated artifact, so it belongs in version control. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 03:57:36 -05:00
Rob Wiederstein	e8d2c69f2d	Remove generated report artifacts from version control Add reports/figures/, reports/slides/, reports/tables/ to .gitignore and untrack previously committed PNGs. These are build artifacts regenerated by tar_make() and deploy.R. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 03:54:51 -05:00
Rob Wiederstein	b38892f49e	Refactor: consistent naming across functions, targets, and pkgdown Functions: prepare_eda_recipe -> build_eda_recipe, create_efficiency_plot -> plot_efficiency, format_class_imbalance_tourney_gt -> format_tournament_gt Targets: model_inputs_prefix -> baf_model_input_prefix, tbl_fraud_by_month_data -> fraud_by_month_summary, model_diag -> diag_fit, winning_params -> best_params, production_recipe_blueprint -> prod_recipe, final_eval_data -> test_predictions pkgdown: restructured reference index into 6 logical sections, removed stale names and development comments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 03:52:34 -05:00
Rob Wiederstein	f47b2e1be2	Add tune_lgbm() and wire hyperparameter tuning into DAG Converts scratch/tune_model.R into a pure tune_lgbm() function, replacing hardcoded winning_params with a fully automated tar_target. Best params (trees=844, depth=3, lr=0.0204, min_n=389) now flow reproducibly into evaluate_final_model() and train_production_model(). PR-AUC improved from 0.165 to 0.198. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 03:25:35 -05:00
Rob Wiederstein	33d0fc31c7	Initial commit: BAF Lakehouse fraud detection pipeline End-to-end LightGBM fraud detection pipeline built as an R package, orchestrated by targets with data stored in MinIO via Apache Arrow. Includes 6-layer Lakehouse architecture, class imbalance tournament, formally tuned hyperparameters (PR-AUC 0.198), and Quarto RevealJS slides. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:19:09 -05:00

12 Commits