3 results for “topic:e-values”
Repository for the paper "Auditing Pay-Per-Token in Large Language Models"
A kernel-userland protocol enforcing information-theoretic bounds on AI adaptivity leakage, benchmark gaming, and capability spillover.
Benchmark for statistically valid AI scientist systems, using audit-closed protocols, transparency logs, and sequential inference to prevent false discoveries in autonomous research agents.