71 lines
3.2 KiB
Markdown
71 lines
3.2 KiB
Markdown
# powershell_example
|
||
|
||
This example demonstrates core programming principles that apply regardless of
|
||
language — Excel, PowerShell, or R:
|
||
|
||
- **One job per script** — each script does exactly one thing
|
||
- **Configuration over hardcoding** — constants like exchange rates live in `.env`, not buried in code
|
||
- **Immutable inputs** — raw data is never modified; the pipeline can always be rerun from scratch
|
||
- **Fail fast** — validation runs early and stops the pipeline with a clear message before bad data spreads
|
||
- **Separation of concerns** — scripts don't know or care what runs before or after them
|
||
- **Orchestration** — a single caller (`main.sh`) owns the sequence and can be scheduled via cron
|
||
|
||
## Project structure
|
||
|
||
```
|
||
powershell_example/
|
||
├── .env ← exchange rate and future config
|
||
├── main.sh ← pipeline caller, runs all steps in order
|
||
├── data/
|
||
│ ├── raw/ ← original source, never modified
|
||
│ ├── interim/ ← transformed working files (steps 03–06)
|
||
│ ├── processed/ ← calculated output (step 07)
|
||
│ └── formatted/ ← presentation-ready, rounded (step 08)
|
||
└── scripts/
|
||
├── 00_paths.R ← paths + config, sourced by all scripts
|
||
├── 01_create_data.R ← creates wide CSVs → raw/
|
||
├── 02_validate.R ← checks column counts, stops on failure
|
||
├── 03_convert_currency.R ← EUR to USD, stays wide → interim/
|
||
├── 04_pivot_income.R ← wide to long → interim/
|
||
├── 05_convert_units.R ← thousands to persons, pivot pop to long → interim/
|
||
├── 06_merge.R ← join income + population → interim/
|
||
├── 07_calc.R ← income per person → processed/
|
||
└── 08_format.R ← round to 2 decimals → formatted/
|
||
```
|
||
|
||
## A note on what to commit
|
||
|
||
This repo commits everything for illustration purposes. In a real project you
|
||
would typically exclude:
|
||
|
||
- **`.env`** — may contain API keys, credentials, or proprietary constants
|
||
- **`data/`** — raw and processed data files are often too large for git and
|
||
may contain proprietary or personally identifiable information
|
||
|
||
Both would normally be listed in `.gitignore`.
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
bash /data/projects/r/powershell_example/main.sh
|
||
```
|
||
|
||
## Scheduling with cron
|
||
|
||
Cron is the Linux/Mac equivalent of **Windows Task Scheduler** — it runs a
|
||
program automatically on a schedule with no human intervention.
|
||
|
||
To run automatically every Monday at 8am:
|
||
|
||
```
|
||
0 8 * * 1 /data/projects/r/powershell_example/main.sh >> /tmp/pipeline.log 2>&1
|
||
```
|
||
|
||
**A note on corporate environments:** IT departments are often protective of
|
||
who can schedule automated jobs on shared servers — and for good reason. Silent
|
||
background processes can consume resources, touch shared databases, or trigger
|
||
emails without anyone knowing they exist. On your own machine, Task Scheduler
|
||
is fair game. On a company server, the right move is to document what the job
|
||
does, show IT, and ask them to schedule it officially. That conversation also
|
||
creates a paper trail, which matters in regulated industries.
|