Validation Methodology

How Rug Cleaner measures market stress, warning conditions, and model limitations

Rug Cleaner evaluates current stress, near-term warning conditions, and medium-term risk across macro and crypto markets. This page explains the evidence, historical tests, and limitations behind those readings.

Validation Overview

The objective is not to maximize one historical metric. The objective is to verify that the engine is useful under multiple stress regimes with a transparent false-alarm cost.

Detect should-warn crises inside a fixed pre-event window
Quantify lead time, not only hit/miss outcomes
Measure false alarms as both day-level burden and clustered episodes
Stress-test robustness with sensitivity, walk-forward, and ablation passes

Benchmark events (should-warn subset)14

Single-rule baselines6

Validation policies2 (operational + stricter validation)

Full benchmark evidence window1990-01-02 to 2026-02-10

Crypto-era evidence window2017-01-01 to 2026-02-10

Public chart window2017-present

Last updated4 Mar 2026

V4 Public Contract

The public contract follows the current accepted Macro Engine V4 specification. The visible MSI chart is a confirmed current-state stress display, while EWI and MTW remain separate horizon signals for fast warning and medium-term watch context.

Accepted specificationv1.14

Visible MSI chartConfirmed stress (V4) only

Fast horizonFast warning (EWI) signal

Medium horizonMedium-term watch (MTW) context

Release governance includes validation checks and rollback controls. Those controls protect the public display contract without exposing internal gate names, trigger values, or decision parameters.

What The Engine Does

The engine separates current confirmed stress, fast warning, and medium-term watch context so readers do not treat every horizon as one blended chart.

Layer	Role	Operational Output
MSI V4	Confirmed current stress from normalized multi-asset stress inputs	Confirmed stress (V4) display
EWI	Fast trajectory warning over recent stress changes	Fast warning (EWI) states
MTW	Medium-term persistence and regime-pressure watch	Medium-term watch (MTW) context

Two policy tracks are reported to keep governance explicit:

Operational validation policy: the policy used to evaluate responsive behavior in historical comparisons
Stricter validation policy: a tighter regime-confirmation policy retained for continuity studies

Additional correlation and regime gates inform the research policy and telemetry paths. These include cross-asset coupling checks, macro decoupling persistence rules, and structural break confirmation methods. Exact gate logic and thresholds are proprietary.

Tier vs Alert vs Episode

MSI bands, alert severity, and episode labels answer different questions. Keeping them separate prevents a chart display label from being read as an alert trigger or a regime class.

Concept	Meaning	Public Interpretation
MSI tier	Current confirmed stress display for the public MSI chart	NORMAL, ELEVATED, HIGH, or CRITICAL display band
Alert severity	Operational warning state used by alerting and response workflows	Warning language can change without redefining the MSI display band
Episode class	Historical or regime-level classification used to explain stress episodes	A credit-led or broad multi-market episode is not a separate MSI tier

Coverage & Caveats

Public chart rows can be muted, caveated, or absent when source coverage does not support a clean confirmed-stress reading. These states are data-quality and market-hours context, not separate stress tiers.

Caveat	Public Meaning
Limited coverage	A point may be muted or excluded when enough supporting source coverage is not available
Stale or unavailable data	The chart can caveat a row when a monitored source is delayed, missing, or unavailable
Market-closed periods	Weekends and holidays can limit non-crypto market confirmation
Weekend crypto-only behavior	Crypto can move while traditional markets are closed, so those rows receive narrower interpretation
Narrow-driver caution	Stress from a single narrow source is shown with caution until broader confirmation develops

Taxonomy Crosswalk

This table is the canonical reference for how MSI bands, EWI tiers, reporting shorthand, and product posture language relate to each other. Other pages may use the terms operationally, but authority lives here.

Term / Label	System	Meaning	Used on	Authoritative definition lives in
MSI display bands (NORMAL / ELEVATED / HIGH / CRITICAL)	Public MSI display	Confirmed stress (V4) labels for a continuous MSI score. Useful for interpretation, but not the proprietary backend alert rules themselves.	How It Works, FAQ, Market Intelligence, Track Record interpretation copy	Methodology / Taxonomy Crosswalk
EWI display states (No warning / Watch-list / Early warning / Confirmed warning)	Early Warning Index	Fast warning (EWI) semantic states shown as No warning, Watch-list, Early warning, and Confirmed warning. They are not the same scale as the MSI display bands.	How It Works, FAQ, Track Record signal explanations	Methodology / Taxonomy Crosswalk
MTW horizon label (Medium-term watch)	Medium-Term Warning	Medium-term watch (MTW) means sustained stress or regime-pressure persistence over weeks or months. It is not the fastest warning layer and does not replace MSI V4 confirmed stress.	How It Works, FAQ, Market Intelligence	Methodology / Decision Path
ORANGE+	Validation reporting shorthand	Episode-level shorthand meaning ORANGE or stronger warning activity for summary statistics and historical reporting.	Track Record, FAQ validation copy	Methodology / Taxonomy Crosswalk
HARD	Validation reporting shorthand	The actionable alert shorthand for sustained high-risk warning episodes that meet proprietary persistence rules.	Track Record tables, FAQ, Methodology decision-path references	Methodology / Taxonomy Crosswalk
Product posture cues (Monitor / Prepare)	Product UX guidance	Action-oriented posture labels layered on top of MSI/EWI output. These phrases guide interpretation but do not define a separate scoring taxonomy.	Track Record narrative copy, Market Intelligence episode UX	Methodology / Taxonomy Crosswalk

Event Protocol And Scoring

Each should-warn event is scored on a fixed causal protocol. Detection windows and false-alarm exclusion zones are standardized across all strategies for comparability.

Detection windowFixed pre-event lookback (weeks before crash date)

Crash neighborhoodBuffer zone around crash for FP exclusion

Detection criterionWarning activity within the detection window

Actionable FP criterionAlert days outside crash neighborhoods

Output metrics include detection rate, average lead time, and false-positive burden measured as both day-level exposure and clustered episodes per year. Exact window sizes and scoring parameters are available to partners under NDA.

False Alarm Methodology

False positives are measured in two dimensions because day-count alone can misrepresent operational burden.

FP days per year captures total alert-time exposure outside crash neighborhoods
FP episodes per year clusters consecutive FP days into discrete alert episodes
Average episode length separates frequent short noise from rare prolonged stress regimes

This avoids a common reporting failure where one model appears cleaner only because its alerts are merged into long contiguous runs.

Benchmark Comparison Protocol

The benchmark suite uses 6 naive rules and 2 engine tracks under identical event windows and FP accounting.

Validation Policy	Crypto Era Detection	Crypto Era Lead	Crypto FP ep/yr	Full-History Detection
Stricter validation	100%	55.1d	4.6	93%
Operational validation	100%	54.6d	12.6	93%

Benchmark modes combine signals for historical comparison; the public MSI chart shows confirmed V4 stress only.

Reporting both policies makes the tradeoff explicit: operational validation emphasizes responsiveness, while the stricter validation policy emphasizes regime confirmation.

Robustness And Generalization

Robustness is tested through rolling normalization, walk-forward segmentation, threshold perturbation, and structural-break checks.

Rolling quantiles and rolling z-scores are causal and use only historical observations
Walk-forward periods are evaluated independently to detect regime-specific overfit
Sensitivity tests perturb key thresholds to measure stability of recall and alert burden
Structural-break testing applies repeated split tests with multiple-testing control

Ablation Impact

Ablation isolates contribution by removing one component family at a time and re-running the full evaluation stack.

Crypto

Volatility

Equity market

Credit

Banking

Rates & USD

Liquidity & Funding

Labor & Macro

Commodities

Output is interpreted qualitatively as incremental value within a governance-capped family blend: if removal materially reduces detection or degrades lead-time profile, the group is load-bearing. The public page names the families but does not publish caps, formulas, or weighting schedules.

Data Integrity And Reproducibility Controls

Canonical hashing for reproducibility and artifact integrity checks
Instrument identity gating (type, quote, venue, currency)
Trading-day-aware freshness controls and stale-feed monitoring
Duplicate and lag-shift detection across input streams
Automated reconciliation gates to detect metric drift across artifacts

Forward Validation Framework

Historical studies answer counterfactual fit. Forward validation answers live behavior through Phase-5 validation rows anchored to Forward snapshots. Daily compatibility telemetry is retained, but operator reports read the Phase-5 row model.

ApproachEach snapshot-anchored row evaluated over fixed forward horizons

Maturity gateImmature forward reads excluded until enough data accumulates

Primary outputPhase-5 classification, maturity, denominator, and confidence evidence

StatusPhase-5 row model authoritative; daily log compatibility retained

Known Limitations And IP Boundary

Backtest evidence is historical; live evidence is growing but still time-limited
Some macro series have publication lags and occasional revisions
Calibration concentration risk exists when tuning focuses on specific recent crises
Idiosyncratic exogenous shocks can compress lead time in any model class
No methodology can eliminate model risk; this framework makes model risk measurable

IP boundary: methodology categories, governance principles, and measured validation outcomes are disclosed; exact formulas, thresholds, weighting schedules, caps, and internal decision parameters remain proprietary.

Not advice: this system is a risk analytics tool, not financial, investment, trading, legal, tax, or compliance advice.

Rug Cleaner

Preparing workbench