Initial commit: illustrative R data pipeline

2026-03-09 14:20:10 -04:00
commit 83e50d2c36
12 changed files with 277 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,70 @@
+# powershell_example
+
+This example demonstrates core programming principles that apply regardless of
+language — Excel, PowerShell, or R:
+
+- **One job per script** — each script does exactly one thing
+- **Configuration over hardcoding** — constants like exchange rates live in `.env`, not buried in code
+- **Immutable inputs** — raw data is never modified; the pipeline can always be rerun from scratch
+- **Fail fast** — validation runs early and stops the pipeline with a clear message before bad data spreads
+- **Separation of concerns** — scripts don't know or care what runs before or after them
+- **Orchestration** — a single caller (`main.sh`) owns the sequence and can be scheduled via cron
+
+## Project structure
+
+```
+powershell_example/
+├── .env                        ← exchange rate and future config
+├── main.sh                     ← pipeline caller, runs all steps in order
+├── data/
+│   ├── raw/                    ← original source, never modified
+│   ├── interim/                ← transformed working files (steps 03–06)
+│   ├── processed/              ← calculated output (step 07)
+│   └── formatted/              ← presentation-ready, rounded (step 08)
+└── scripts/
+    ├── 00_paths.R              ← paths + config, sourced by all scripts
+    ├── 01_create_data.R        ← creates wide CSVs → raw/
+    ├── 02_validate.R           ← checks column counts, stops on failure
+    ├── 03_convert_currency.R   ← EUR to USD, stays wide → interim/
+    ├── 04_pivot_income.R       ← wide to long → interim/
+    ├── 05_convert_units.R      ← thousands to persons, pivot pop to long → interim/
+    ├── 06_merge.R              ← join income + population → interim/
+    ├── 07_calc.R               ← income per person → processed/
+    └── 08_format.R             ← round to 2 decimals → formatted/
+```
+
+## A note on what to commit
+
+This repo commits everything for illustration purposes. In a real project you
+would typically exclude:
+
+- **`.env`** — may contain API keys, credentials, or proprietary constants
+- **`data/`** — raw and processed data files are often too large for git and
+  may contain proprietary or personally identifiable information
+
+Both would normally be listed in `.gitignore`.
+
+## Usage
+
+```bash
+bash /data/projects/r/powershell_example/main.sh
+```
+
+## Scheduling with cron
+
+Cron is the Linux/Mac equivalent of **Windows Task Scheduler** — it runs a
+program automatically on a schedule with no human intervention.
+
+To run automatically every Monday at 8am:
+
+```
+0 8 * * 1  /data/projects/r/powershell_example/main.sh >> /tmp/pipeline.log 2>&1
+```
+
+**A note on corporate environments:** IT departments are often protective of
+who can schedule automated jobs on shared servers — and for good reason. Silent
+background processes can consume resources, touch shared databases, or trigger
+emails without anyone knowing they exist. On your own machine, Task Scheduler
+is fair game. On a company server, the right move is to document what the job
+does, show IT, and ask them to schedule it officially. That conversation also
+creates a paper trail, which matters in regulated industries.