π Objectives
- Quantify skill of V9.7 catastrophe windows across time/region/node class.
- Decompose error sources (forcings, coupling, observation noise, model misspecification).
- Publish reproducible validation reports and uncertainty-aware outputs.
π§ Key Concepts
- Proper scoring: Brier, log score, CRPS for probabilistic windows.
- Reliability vs. resolution: calibration curves; sharpness trade-offs.
- Aleatoric vs. epistemic: data noise vs. model structure uncertainty.
- Ensembles & perturbations: sensitivity to inputs (V9.2βV9.6).
π¦ Data Inputs
- Hindcast windows from V9.7 (node_id, start/end, tier, prob).
- Event catalogs: seismicity, uplift/subsidence, melt proxies, hydrologic pulses.
- Truth masks per node/time (release/no release + magnitude class).
π§ͺ Methods
- Truth alignment: define event windows & lead/lag tolerances by hazard type.
- Scoring suite: Brier, log-loss, ROC-AUC/PR-AUC; reliability & sharpness diagnostics.
- Uncertainty partition: bootstrap resampling; input-perturbation ensembles; ablation studies.
- Regionalization: stratified skill by node class (rift, subduction, craton edge) & climate mode.
- Drift checks: rolling-window monitoring for calibration drift; auto-recalibration triggers.
πΊοΈ Figures
- Reliability diagrams (global & per-node-class) + sharpness histograms.
- Skill maps: Brier skill score by node, with uncertainty whiskers.
- Attribution bars: contribution of inputs (V9.2βV9.6) to forecast skill.
β Outputs
- Validation Report: HTML/PDF with metrics, figures, and narrative summary.
- Calibrated Windows: updated probabilities with confidence intervals.
- Audit Artifacts: seeds, config, hashes for reproducibility; CSV of node-level scores.
π References (to stage)
- Probability forecast verification literature; GNSS/seismic validation protocols; GRACE/altimetry QA docs.