End-to-end LightGBM fraud detection pipeline built as an R package, orchestrated by targets with data stored in MinIO via Apache Arrow. Includes 6-layer Lakehouse architecture, class imbalance tournament, formally tuned hyperparameters (PR-AUC 0.198), and Quarto RevealJS slides. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
57 lines
1.8 KiB
YAML
57 lines
1.8 KiB
YAML
url: https://docs.robwiederstein.org/baflakehouse
|
|
|
|
template:
|
|
bootstrap: 5
|
|
bootswatch: flatly # Clean, professional look
|
|
|
|
navbar:
|
|
structure:
|
|
left: [intro, reference, articles, presentation]
|
|
components:
|
|
presentation:
|
|
text: "Slides"
|
|
icon: fa-person-chalkboard
|
|
href: slides/index.html
|
|
|
|
reference:
|
|
- title: "Data Ingestion & Lakehouse Setup"
|
|
desc: "Functions for moving data from CSV to partitioned Parquet in MinIO."
|
|
contents:
|
|
- baflakehouse-package
|
|
- convert_to_parquet
|
|
- connect_baf
|
|
- clean_baf_base
|
|
|
|
- title: "Feature Engineering & Preprocessing"
|
|
desc: "The 'Recipes' layer of the pipeline."
|
|
contents:
|
|
- engineer_features
|
|
- prepare_eda_recipe
|
|
- build_baf_recipe # NEW: Untrained blueprint for production
|
|
- generate_model_inputs
|
|
|
|
- title: "The Tournament (Model Selection)"
|
|
desc: "Cross-validation and imbalance strategy testing."
|
|
contents:
|
|
- run_imbalance_tournament
|
|
- train_diag_model
|
|
- create_efficiency_plot # Moved here: Belongs with the tournament
|
|
|
|
- title: "Final Evaluation & Production Deployment"
|
|
desc: "Results on unseen data (Months 6-7) and MinIO artifact serialization."
|
|
contents:
|
|
- evaluate_final_model
|
|
- train_production_model # NEW: The final deployment function
|
|
|
|
- title: "Reporting: Tables & Visualizations"
|
|
desc: "Generating ggplot2 figures and gt tables for Quarto."
|
|
contents:
|
|
- starts_with("plot_")
|
|
- starts_with("compute_")
|
|
- starts_with("format_") # Neatly catches all your gt table formatters
|
|
|
|
- title: "Pipeline Utilities"
|
|
desc: "Internal helpers for the targets workflow and slide generation."
|
|
contents:
|
|
- starts_with("save_report_")
|
|
- render_slides # Consolidated here |