Interactive demo · belief MDP · subsidised RCTs

Optimizing Social Utility in Sequential Experiments

A regulator (the principal) subsidises a fraction ε of a developer's (the agent's) trial costs. The agent runs randomised trials sequentially, updating a Beta(α,β) belief, and stops once the principal's e-value test rejects H₀. Drag the subsidy and watch the agent's optimal policy — and the resulting social utility — respond.

Ander Artola Velasco · Stratis Tsirtsis · Manuel Gomez Rodriguez — code repository ↗

Subsidy level ε — fraction of the agent's total cost the principal covers upon approval

0.000(0%)

First trial size n₀

—

Approval prob.

—

Opt-out prob.

—

Agent utility

—

Social utility

—

Optimal ε★

—

recomputing…

① Optimal policy in belief space

For each belief (α,β), the agent's optimal trial size n. Clay = opt out (abandon); the sage frontier = enough evidence to approve.

opt out (n=0) H₀ rejected n: 1 n_max

recomputing…

② Optimal value function V^ε(α,β)

The agent's expected profit ($M) from each belief under the optimal policy. Brighter = more valuable. Raising ε lifts the whole surface.

H₀ rejected $0 max

③ Social utility vs. subsidy

The principal's objective Ū^S(ε) (solid) and the agent's approval probability (dashed). The vertical mark is your current ε; ★ is the optimum.

social utility ($M) approval probability

④ Agent utility vs. subsidy

The agent's anticipated utility Ū^A(π^ε;ε) — piecewise-linear & convex (Prop. 8). Each kink is a switch in the agent's optimal policy.

agent utility ($M) �dotted = policy-switch points (partition 𝒫)

▶ Model parameters · change & recompute

θᵇ baseline efficacy

κ false-positive bound

ρᴬ agent benefit ($M)

ρˢ social benefit ($M)

α₀ prior successes+1

β₀ prior failures+1

c₀ fixed cost / trial ($M)

c₁ cost / patient ($M)

n_max max patients / trial

T max trials − 1

display step ℓ heat-map time

ε_max max subsidy

Larger n_max/T ⇒ slower precompute (the cost grows ~n_max⁴·T).

What you're looking at

The agent holds a Beta(α,β) belief about efficacy θ*. Running a trial of size n draws X ~ Beta-Binomial and updates the belief to (α+X, β+n−X).

The principal multiplies e-values into a test process; approval (rejecting H₀) happens once f(α,β) ≥ 1/κ, a linear frontier in belief space — the sage edge in panels ① & ②.

The agent solves a finite-horizon belief MDP by backward induction (Alg. 2/3), choosing each trial size to maximise expected profit; it opts out when continuing isn't worth the cost.

The subsidy story

At ε = 0 the moonshot is too risky: the agent abandons the trial at the start, so society gets nothing. As ε rises, the de-risked agent begins experimenting — the clay opt-out region in panel ① recedes.

But every subsidy dollar is a transfer the principal pays only on approval, so Ūˢ(ε) trades off more approvals against higher cost. The interior peak ε★ (panel ③) is what Algorithm 1 finds exactly via divide-and-conquer.