External vs swap regret: a 3-action demo

With three actions, swap regret and external regret are genuinely different concepts. Click anywhere inside the triangle to set the round's payoff vector u = (u₁, u₂, u₃) — your click position determines the payoff via barycentric coordinates (corners are 1 for that action, 0 for the others). Watch how EW's external regret and SDA's swap regret diverge on adversarial sequences.

Click in the triangle to set u = (u₁, u₂, u₃)

no rounds yet

EW (external regret) regret: —

α₁ = 0.33 α₂ = 0.33 α₃ = 0.33

SDA (swap regret) swap regret: —

α₁ = 0.33 α₂ = 0.33 α₃ = 0.33

Quick adversary:

Round 0

Learning rate ε 0.30

EW external regret EW swap regret SDA swap regret

What's going on

Both algorithms face the same payoff stream. Each round they commit to a distribution over 3 actions, then observe the full payoff vector u and update.

EW tracks cumulative payoffs per action and plays each action with probability proportional to (1+ε) raised to its cumulative payoff. It guarantees vanishing external regret: no single fixed action does much better in hindsight.

SDA runs three internal EW experts, one per action. Each round, the experts each output a recommended distribution, SDA stacks these as rows of a row-stochastic matrix and plays its stationary distribution. Expert a then sees scaled payoffs αₐ·u. It guarantees vanishing swap regret: no rerouting a → f(a) of your own actions does much better.

The chart shows three curves. EW external regret is what EW is designed to minimize. EW swap regret is the same play sequence judged by the stronger benchmark — what EW doesn't control. SDA swap regret is SDA's behavior under that stronger benchmark. The gap between the two purple lines is what swap regret penalizes that external regret ignores.

External vs swap regret: a 3-action demo

Interactive demo

What's going on