A regulator (the principal) subsidises a fraction ε of a developer's (the agent's) trial costs. The agent runs randomised trials sequentially, updating a Beta(α,β) belief, and stops once the principal's e-value test rejects H₀. Drag the subsidy and watch the agent's optimal policy — and the resulting social utility — respond.
For each belief (α,β), the agent's optimal trial size n. Clay = opt out (abandon); the sage frontier = enough evidence to approve.
The agent's expected profit ($M) from each belief under the optimal policy. Brighter = more valuable. Raising ε lifts the whole surface.
The principal's objective ŪS(ε) (solid) and the agent's approval probability (dashed). The vertical mark is your current ε; ★ is the optimum.
The agent's anticipated utility ŪA(πε;ε) — piecewise-linear & convex (Prop. 8). Each kink is a switch in the agent's optimal policy.
The agent holds a Beta(α,β) belief about efficacy θ*. Running a trial of size
n draws X ~ Beta-Binomial and updates the belief to (α+X, β+n−X).
The principal multiplies e-values into a test process; approval (rejecting H₀) happens once
f(α,β) ≥ 1/κ, a linear frontier in belief space — the sage edge in panels ① & ②.
The agent solves a finite-horizon belief MDP by backward induction (Alg. 2/3), choosing each trial size to maximise expected profit; it opts out when continuing isn't worth the cost.
At ε = 0 the moonshot is too risky: the agent abandons the trial at the start, so society gets
nothing. As ε rises, the de-risked agent begins experimenting — the clay opt-out region in panel ① recedes.
But every subsidy dollar is a transfer the principal pays only on approval, so Ūˢ(ε) trades off
more approvals against higher cost. The interior peak ε★ (panel ③) is what Algorithm 1 finds
exactly via divide-and-conquer.
Because the agent's utility is piecewise-linear & convex in ε (panel ④, Prop. 8), the social utility is piecewise-linear on each policy interval and maximised at a partition endpoint.