Allocator-grade validation protocol for the Macro Risk Engine
This page defines the evaluation protocol and control framework. For implementation detail, see How It Works. For event-level outcomes, see Track Record.
Benchmark and validation artifacts are regenerated periodically from the latest available data. Last updated:
The objective is not to maximize one historical metric. The objective is to verify that the engine is useful under multiple stress regimes with a transparent false-alarm cost.
The production stack is MSI + EWI + MTW. MSI captures current stress state. EWI captures trajectory risk. MTW stabilizes output behavior across short-term noise.
| Layer | Role | Operational Output |
|---|---|---|
| MSI | State score from normalized multi-asset stress inputs. | Daily stress level baseline. |
| EWI | Trajectory model over recent MSI evolution. | Hard/soft escalation context. |
| MTW | Temporal stabilization and warning persistence behavior. | Reduced one-day alert churn. |
Two policy tracks are reported to keep governance explicit:
Additional correlation and regime gates inform the research policy and telemetry paths. These include cross-asset coupling checks, macro decoupling persistence rules, and structural break confirmation methods. Exact gate logic and thresholds are proprietary.
Each should-warn event is scored on a fixed causal protocol. Detection windows and false-alarm exclusion zones are standardized across all strategies for comparability.
Output metrics include detection rate, average lead time, and false-positive burden measured as both day-level exposure and clustered episodes per year. Exact window sizes and scoring parameters are available to partners under NDA.
False positives are measured in two dimensions because day-count alone can misrepresent operational burden.
This avoids a common reporting failure where one model appears cleaner only because its alerts are merged into long contiguous runs.
The benchmark suite uses 6 naive rules and 2 engine tracks under identical event windows and FP accounting.
| Engine Mode | Crypto Era Detection | Crypto Era Lead | Crypto FP ep/yr | Full-History Detection |
|---|---|---|---|---|
| Research-calibrated | 100% | 55.1d | 4.6 | 93% |
| Live production | 100% | 54.6d | 12.6 | 93% |
Reporting both modes makes the tradeoff explicit: live policy emphasizes operational responsiveness, while research-calibrated policy emphasizes stricter regime confirmation.
Robustness is tested through rolling normalization, walk-forward segmentation, threshold perturbation, and structural-break checks.
Ablation isolates contribution by removing one component family at a time and re-running the full evaluation stack.
Output is interpreted as incremental value: if removal materially reduces detection or degrades lead-time profile, the group is load-bearing.
Historical studies answer counterfactual fit. Forward validation answers live behavior. Predictions are logged daily and scored only when sufficient future data exists to evaluate the outcome.
IP boundary: formulas, governance, and measured outcomes are disclosed; exact thresholds, weighting schedules, and internal decision parameters remain proprietary.
Not financial advice: this system is a risk analytics tool, not an investment recommendation engine.
Event timelines, detection windows, and benchmark outcomes are published on the Track Record page.