Files
powershell_example/README.md

71 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# powershell_example
This example demonstrates core programming principles that apply regardless of
language — Excel, PowerShell, or R:
- **One job per script** — each script does exactly one thing
- **Configuration over hardcoding** — constants like exchange rates live in `.env`, not buried in code
- **Immutable inputs** — raw data is never modified; the pipeline can always be rerun from scratch
- **Fail fast** — validation runs early and stops the pipeline with a clear message before bad data spreads
- **Separation of concerns** — scripts don't know or care what runs before or after them
- **Orchestration** — a single caller (`main.sh`) owns the sequence and can be scheduled via cron
## Project structure
```
powershell_example/
├── .env ← exchange rate and future config
├── main.sh ← pipeline caller, runs all steps in order
├── data/
│ ├── raw/ ← original source, never modified
│ ├── interim/ ← transformed working files (steps 0306)
│ ├── processed/ ← calculated output (step 07)
│ └── formatted/ ← presentation-ready, rounded (step 08)
└── scripts/
├── 00_paths.R ← paths + config, sourced by all scripts
├── 01_create_data.R ← creates wide CSVs → raw/
├── 02_validate.R ← checks column counts, stops on failure
├── 03_convert_currency.R ← EUR to USD, stays wide → interim/
├── 04_pivot_income.R ← wide to long → interim/
├── 05_convert_units.R ← thousands to persons, pivot pop to long → interim/
├── 06_merge.R ← join income + population → interim/
├── 07_calc.R ← income per person → processed/
└── 08_format.R ← round to 2 decimals → formatted/
```
## A note on what to commit
This repo commits everything for illustration purposes. In a real project you
would typically exclude:
- **`.env`** — may contain API keys, credentials, or proprietary constants
- **`data/`** — raw and processed data files are often too large for git and
may contain proprietary or personally identifiable information
Both would normally be listed in `.gitignore`.
## Usage
```bash
bash /data/projects/r/powershell_example/main.sh
```
## Scheduling with cron
Cron is the Linux/Mac equivalent of **Windows Task Scheduler** — it runs a
program automatically on a schedule with no human intervention.
To run automatically every Monday at 8am:
```
0 8 * * 1 /data/projects/r/powershell_example/main.sh >> /tmp/pipeline.log 2>&1
```
**A note on corporate environments:** IT departments are often protective of
who can schedule automated jobs on shared servers — and for good reason. Silent
background processes can consume resources, touch shared databases, or trigger
emails without anyone knowing they exist. On your own machine, Task Scheduler
is fair game. On a company server, the right move is to document what the job
does, show IT, and ask them to schedule it officially. That conversation also
creates a paper trail, which matters in regulated industries.