Functions: prepare_eda_recipe -> build_eda_recipe,
create_efficiency_plot -> plot_efficiency,
format_class_imbalance_tourney_gt -> format_tournament_gt
Targets: model_inputs_prefix -> baf_model_input_prefix,
tbl_fraud_by_month_data -> fraud_by_month_summary,
model_diag -> diag_fit, winning_params -> best_params,
production_recipe_blueprint -> prod_recipe,
final_eval_data -> test_predictions
pkgdown: restructured reference index into 6 logical sections,
removed stale names and development comments.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
66 lines
1.8 KiB
YAML
66 lines
1.8 KiB
YAML
url: https://docs.robwiederstein.org/baflakehouse
|
|
|
|
template:
|
|
bootstrap: 5
|
|
bootswatch: flatly
|
|
|
|
navbar:
|
|
structure:
|
|
left: [intro, reference, articles, presentation]
|
|
components:
|
|
presentation:
|
|
text: "Slides"
|
|
icon: fa-person-chalkboard
|
|
href: slides/index.html
|
|
|
|
reference:
|
|
- title: "Data Ingestion & Lakehouse Setup"
|
|
desc: "Functions for moving raw CSV data into the MinIO Lakehouse as partitioned Parquet."
|
|
contents:
|
|
- baflakehouse-package
|
|
- convert_to_parquet
|
|
- connect_baf
|
|
- clean_baf_base
|
|
|
|
- title: "Feature Engineering & Preprocessing"
|
|
desc: "Recipes and transformations applied across the pipeline layers."
|
|
contents:
|
|
- engineer_features
|
|
- generate_model_inputs
|
|
- build_eda_recipe
|
|
- build_baf_recipe
|
|
|
|
- title: "Exploratory Data Analysis"
|
|
desc: "Diagnostic model and visualizations for understanding the fraud signal."
|
|
contents:
|
|
- train_diag_model
|
|
- plot_var_imp
|
|
- plot_hexbin_interaction
|
|
- plot_missingness
|
|
- plot_num_cor
|
|
|
|
- title: "Model Selection & Tuning"
|
|
desc: "Imbalance strategy tournament, hyperparameter tuning, and results formatting."
|
|
contents:
|
|
- run_imbalance_tournament
|
|
- tune_lgbm
|
|
- format_tournament_gt
|
|
- plot_efficiency
|
|
|
|
- title: "Final Evaluation & Production Deployment"
|
|
desc: "Holdout evaluation on months 6-7 and MinIO model artifact serialization."
|
|
contents:
|
|
- evaluate_final_model
|
|
- train_production_model
|
|
|
|
- title: "Reporting"
|
|
desc: "Figures, tables, and slide rendering for the Quarto presentation."
|
|
contents:
|
|
- plot_fraud_by_month
|
|
- plot_conf_mat_heatmap
|
|
- compute_fraud_by_month
|
|
- format_fraud_by_month_gt
|
|
- save_report_figure
|
|
- save_report_table
|
|
- render_slides
|