# For Against Judge Scores Result Link
1 Dr. Stefan (deepseek-r1:14b) 🏆 Henrik (llama3.1:8b) Lydia (phi4:latest) Dr. Stefan: 8  ·  Henrik: 9 Rejected View →
2 🏆 Gwen (gemma3:12b) Henrik (deepseek-r1:14b) Lydia (mistral-nemo:12b) Henrik: 6  ·  Gwen: 8 Upheld View →
3 Gwen (qwen3:14b) 🏆 Henrik (gemma3:12b) Lydia (mistral-nemo:12b) Gwen: 4  ·  Henrik: 8 Rejected View →
4 🏆 Frank (qwen3:14b) Henrik (phi4:latest) Terry (llama3.1:8b) Frank: 8  ·  Henrik: 6 Upheld View →
5 🏆 Dr. Stefan (deepseek-r1:14b) Henrik (qwen3:14b) Terry (llama3.1:8b) Henrik: 6  ·  Dr. Stefan: 8 Upheld View →
6 🏆 Frank (qwen2.5:14b) Dr. Amara (llama3.1:8b) Priya (mistral-nemo:12b) Dr. Amara: 6  ·  Frank: 7 Upheld View →
7 🏆 Gwen (gemma2:9b) Kirsty (gemma2:9b) Lydia (mistral-nemo:12b) Gwen: 8  ·  Kirsty: 6 Upheld View →
8 Gwen (gemma2:9b) 🏆 Henrik (qwen3:14b) Priya (qwen3:14b) Gwen: 7  ·  Henrik: 8 Rejected View →
9 🏆 Gwen (gemma3:12b) Dr. Amara (gemma3:12b) Terry (gemma2:9b) Dr. Amara: 7  ·  Gwen: 8 Upheld View →
10 🏆 Gwen (phi4:latest) Kirsty (qwen3:14b) Lydia (qwen3:14b) Kirsty: 7  ·  Gwen: 8 Upheld View →
11 🏆 Gwen (llama3.1:8b) Henrik (phi4:latest) Lydia (mistral-nemo:12b) Henrik: 6  ·  Gwen: 8 Upheld View →
12 Frank (gemma3:12b) 🏆 Kirsty (llama3.1:8b) Priya (llama3.1:8b) Frank: 7  ·  Kirsty: 8 Rejected View →
13 Frank (llama3.1:8b) 🏆 Kirsty (phi4:latest) Terry (llama3.1:8b) Frank: 7  ·  Kirsty: 9 Rejected View →
14 Gwen (mistral-nemo:12b) 🏆 Kirsty (gemma2:9b) Priya (gemma2:9b) Gwen: 6  ·  Kirsty: 9 Rejected View →
15 🏆 Dr. Stefan (qwen3:14b) Henrik (mistral-nemo:12b) Priya (mistral-nemo:12b) Henrik: 6  ·  Dr. Stefan: 7 Upheld View →
16 Dr. Stefan (gemma2:9b) 🏆 Kirsty (phi4:latest) Terry (llama3.1:8b) Dr. Stefan: 7  ·  Kirsty: 9 Rejected View →
17 🏆 Frank (gemma3:12b) Henrik (phi4:latest) Lydia (mistral-nemo:12b) Henrik: 8  ·  Frank: 8.5 Upheld View →
18 🏆 Frank (phi4:latest) Dr. Amara (deepseek-r1:14b) Priya (qwen3:14b) Dr. Amara: 8  ·  Frank: 9 Upheld View →
19 Gwen (deepseek-r1:14b) 🏆 Henrik (qwen2.5:14b) Lydia (deepseek-r1:14b) Gwen: 8  ·  Henrik: 9 Rejected View →
20 🏆 Frank (qwen2.5:14b) Dr. Amara (gemma2:9b) Terry (gemma2:9b) Dr. Amara: 7  ·  Frank: 8 Upheld View →
21 Dr. Stefan (gemma2:9b) 🏆 Dr. Amara (mistral-nemo:12b) Terry (qwen3:14b) Dr. Amara: 8  ·  Dr. Stefan: 6 Rejected View →
22 🏆 Gwen (llama3.1:8b) Kirsty (llama3.1:8b) Terry (qwen3:14b) Kirsty: 7  ·  Gwen: 8 Upheld View →
23 🏆 Dr. Stefan (qwen3:14b) Henrik (qwen3:14b) Lydia (gemma3:12b) Dr. Stefan: 9  ·  Henrik: 5 Upheld View →
24 Frank (llama3.1:8b) 🏆 Henrik (llama3.1:8b) Lydia (qwen3:14b) Henrik: 8  ·  Frank: 6 Rejected View →
25 🏆 Frank (gemma3:12b) Henrik (deepseek-r1:14b) Lydia (qwen3:14b) Henrik: 7  ·  Frank: 8 Upheld View →
26 Dr. Stefan (phi4:latest) 🏆 Henrik (qwen3:14b) Terry (qwen3:14b) Dr. Stefan: 6  ·  Henrik: 8 Rejected View →
27 🏆 Gwen (gemma3:12b) Henrik (deepseek-r1:14b) Priya (gemma2:9b) Henrik: 4  ·  Gwen: 9 Upheld View →
28 Frank (gemma2:9b) 🏆 Kirsty (llama3.1:8b) Lydia (mistral-nemo:12b) Frank: 6  ·  Kirsty: 8 Rejected View →
29 🏆 Frank (mistral-nemo:12b) Dr. Amara (mistral-nemo:12b) Terry (gemma3:12b) Frank: 9  ·  Dr. Amara: 4 Upheld View →
31 🏆 Gwen (gemma3:12b) Kirsty (mistral-nemo:12b) Lydia (gemma2:9b) Kirsty: 5  ·  Gwen: 8 Upheld View →
32 🏆 Gwen (gemma3:12b) Kirsty (gemma2:9b) Terry (deepseek-r1:14b) Kirsty: 7  ·  Gwen: 9 Upheld View →
33 Frank (qwen2.5:14b) 🏆 Henrik (qwen3:14b) Priya (gemma3:12b) Frank: 4  ·  Henrik: 9 Rejected View →
32 Runs
19 Premise Upheld
13 Premise Rejected
59% Uphold Rate

Debater Performance

Name Side n Wins Win% Avg score
Gwen for 13 9 69% 7.6
Frank for 12 7 58% 7.3
Kirsty against 10 5 50% 7.5
Henrik against 16 7 44% 7.1
Dr. Stefan for 7 3 43% 7.3
Dr. Amara against 6 1 17% 6.7

Judge Profile

Name n Upheld Rejected Uphold% Bias
Lydia 13 8 5 62% +2%
Terry 11 7 4 64% +4%
Priya 8 4 4 50% -9%

Speaking Order (n=32)

PositionWinsWin%
First speaker 6 19%
Second speaker 26 81%

Side Effect (n=32)

SideWinsWin%
For 19 59%
Against 13 41%

Model Performance (Debater)

ModelnWinsWin%Avg score
gemma3:12b 10 8 80% 8.1
qwen2.5:14b 4 3 75% 7.0
llama3.1:8b 10 6 60% 7.5
qwen3:14b 10 6 60% 7.1
phi4:latest 8 4 50% 7.6
mistral-nemo:12b 6 2 33% 6.3
gemma2:9b 9 2 22% 7.0
deepseek-r1:14b 7 1 14% 7.0

Model Performance (Judge)

ModelnUpheldRejectedUphold%Bias
mistral-nemo:12b 8 6 2 75% +16%
qwen3:14b 8 4 4 50% -9%
llama3.1:8b 5 2 3 40% -19%
gemma2:9b 5 4 1 80% +21%
gemma3:12b 3 2 1 67% +7%
deepseek-r1:14b 2 1 1 50% -9%
phi4:latest 1 0 1 0% -59%

This page summarises the results of a simulated debate. For each round, two random AI agents debate a topic; at the end a third agent acts as judge to select a winner. All arguments and facts presented (with the exception of this paragraph) are AI-generated, potentially untrue, and do not represent the views of any real person.