Files
bank-fraud-baf-lakehouse/_pkgdown.yml
Rob Wiederstein f47b2e1be2 Add tune_lgbm() and wire hyperparameter tuning into DAG
Converts scratch/tune_model.R into a pure tune_lgbm() function,
replacing hardcoded winning_params with a fully automated tar_target.
Best params (trees=844, depth=3, lr=0.0204, min_n=389) now flow
reproducibly into evaluate_final_model() and train_production_model().
PR-AUC improved from 0.165 to 0.198.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 03:25:35 -05:00

58 lines
1.8 KiB
YAML

url: https://docs.robwiederstein.org/baflakehouse
template:
bootstrap: 5
bootswatch: flatly # Clean, professional look
navbar:
structure:
left: [intro, reference, articles, presentation]
components:
presentation:
text: "Slides"
icon: fa-person-chalkboard
href: slides/index.html
reference:
- title: "Data Ingestion & Lakehouse Setup"
desc: "Functions for moving data from CSV to partitioned Parquet in MinIO."
contents:
- baflakehouse-package
- convert_to_parquet
- connect_baf
- clean_baf_base
- title: "Feature Engineering & Preprocessing"
desc: "The 'Recipes' layer of the pipeline."
contents:
- engineer_features
- prepare_eda_recipe
- build_baf_recipe # NEW: Untrained blueprint for production
- generate_model_inputs
- title: "The Tournament (Model Selection)"
desc: "Cross-validation and imbalance strategy testing."
contents:
- run_imbalance_tournament
- tune_lgbm
- train_diag_model
- create_efficiency_plot # Moved here: Belongs with the tournament
- title: "Final Evaluation & Production Deployment"
desc: "Results on unseen data (Months 6-7) and MinIO artifact serialization."
contents:
- evaluate_final_model
- train_production_model # NEW: The final deployment function
- title: "Reporting: Tables & Visualizations"
desc: "Generating ggplot2 figures and gt tables for Quarto."
contents:
- starts_with("plot_")
- starts_with("compute_")
- starts_with("format_") # Neatly catches all your gt table formatters
- title: "Pipeline Utilities"
desc: "Internal helpers for the targets workflow and slide generation."
contents:
- starts_with("save_report_")
- render_slides # Consolidated here