Validation
Two layers:
Hull reference (unit tests)
Every engine is unit-tested against textbook reference values from Hull, Options, Futures, and Other Derivatives (11e). For an at-the-money call with S = K = 100, T = 1y, r = 5%, σ = 20%, q = 0:
| quantity | Hull | tolerance |
|---|---|---|
| Black-Scholes European call | 10.4506 | 1e-3 |
| Black-Scholes European put | 5.5735 | 1e-3 |
| Δ (call) | 0.6368 | 1e-4 |
| Γ | 0.01876 | 1e-5 |
| Θ (per day) | −0.01757 | 1e-4 |
| ν (per 1%) | 0.37524 | 1e-4 |
| ρ (per 1%) | 0.53232 | 1e-3 |
Plus put-call parity: C − P − (S·e^(−q·T) − K·e^(−r·T)) = 0 to 1e-10.
CRR American is tested against the same Hull inputs (degenerate to European when there’s no early-exercise premium) plus the known Hull American put benchmark of 6.0395 vs European 5.5735 to verify early-exercise pickup.
Regression against Tenor’s QuantLib output
Beyond Hull, the gem is regression-tested against ~500 historical option snapshots from Tenor’s production database, where Greeks were computed by QuantLib (CRR Binomial American 200-step / BlackCalculator European). Methodology, observed drift, and known limitations are documented in REGRESSION_REPORT.md.
The regression suite is opt-in (bundle exec rake regression) — it’s slow and has small documented drift in deep-ITM American boundary conditions, so it doesn’t gate CI. The Hull unit tests do.
Fixture provenance
The regression fixture is regenerated manually by the maintainer when the source data changes. The export tool (tools/golden_dataset_export.rb) documents the exact SQL query used and the expected JSON shape, so future runs are reproducible.