| # |
For |
Against |
Judge |
Scores |
Result |
Link |
| 1 |
Dr. Stefan
(deepseek-r1:14b)
|
🏆 Henrik
(llama3.1:8b)
|
Lydia
(phi4:latest)
|
Dr. Stefan: 8 ·
Henrik: 9
|
Rejected
|
View →
|
| 2 |
🏆 Gwen
(gemma3:12b)
|
Henrik
(deepseek-r1:14b)
|
Lydia
(mistral-nemo:12b)
|
Henrik: 6 ·
Gwen: 8
|
Upheld
|
View →
|
| 3 |
Gwen
(qwen3:14b)
|
🏆 Henrik
(gemma3:12b)
|
Lydia
(mistral-nemo:12b)
|
Gwen: 4 ·
Henrik: 8
|
Rejected
|
View →
|
| 4 |
🏆 Frank
(qwen3:14b)
|
Henrik
(phi4:latest)
|
Terry
(llama3.1:8b)
|
Frank: 8 ·
Henrik: 6
|
Upheld
|
View →
|
| 5 |
🏆 Dr. Stefan
(deepseek-r1:14b)
|
Henrik
(qwen3:14b)
|
Terry
(llama3.1:8b)
|
Henrik: 6 ·
Dr. Stefan: 8
|
Upheld
|
View →
|
| 6 |
🏆 Frank
(qwen2.5:14b)
|
Dr. Amara
(llama3.1:8b)
|
Priya
(mistral-nemo:12b)
|
Dr. Amara: 6 ·
Frank: 7
|
Upheld
|
View →
|
| 7 |
🏆 Gwen
(gemma2:9b)
|
Kirsty
(gemma2:9b)
|
Lydia
(mistral-nemo:12b)
|
Gwen: 8 ·
Kirsty: 6
|
Upheld
|
View →
|
| 8 |
Gwen
(gemma2:9b)
|
🏆 Henrik
(qwen3:14b)
|
Priya
(qwen3:14b)
|
Gwen: 7 ·
Henrik: 8
|
Rejected
|
View →
|
| 9 |
🏆 Gwen
(gemma3:12b)
|
Dr. Amara
(gemma3:12b)
|
Terry
(gemma2:9b)
|
Dr. Amara: 7 ·
Gwen: 8
|
Upheld
|
View →
|
| 10 |
🏆 Gwen
(phi4:latest)
|
Kirsty
(qwen3:14b)
|
Lydia
(qwen3:14b)
|
Kirsty: 7 ·
Gwen: 8
|
Upheld
|
View →
|
| 11 |
🏆 Gwen
(llama3.1:8b)
|
Henrik
(phi4:latest)
|
Lydia
(mistral-nemo:12b)
|
Henrik: 6 ·
Gwen: 8
|
Upheld
|
View →
|
| 12 |
Frank
(gemma3:12b)
|
🏆 Kirsty
(llama3.1:8b)
|
Priya
(llama3.1:8b)
|
Frank: 7 ·
Kirsty: 8
|
Rejected
|
View →
|
| 13 |
Frank
(llama3.1:8b)
|
🏆 Kirsty
(phi4:latest)
|
Terry
(llama3.1:8b)
|
Frank: 7 ·
Kirsty: 9
|
Rejected
|
View →
|
| 14 |
Gwen
(mistral-nemo:12b)
|
🏆 Kirsty
(gemma2:9b)
|
Priya
(gemma2:9b)
|
Gwen: 6 ·
Kirsty: 9
|
Rejected
|
View →
|
| 15 |
🏆 Dr. Stefan
(qwen3:14b)
|
Henrik
(mistral-nemo:12b)
|
Priya
(mistral-nemo:12b)
|
Henrik: 6 ·
Dr. Stefan: 7
|
Upheld
|
View →
|
| 16 |
Dr. Stefan
(gemma2:9b)
|
🏆 Kirsty
(phi4:latest)
|
Terry
(llama3.1:8b)
|
Dr. Stefan: 7 ·
Kirsty: 9
|
Rejected
|
View →
|
| 17 |
🏆 Frank
(gemma3:12b)
|
Henrik
(phi4:latest)
|
Lydia
(mistral-nemo:12b)
|
Henrik: 8 ·
Frank: 8.5
|
Upheld
|
View →
|
| 18 |
🏆 Frank
(phi4:latest)
|
Dr. Amara
(deepseek-r1:14b)
|
Priya
(qwen3:14b)
|
Dr. Amara: 8 ·
Frank: 9
|
Upheld
|
View →
|
| 19 |
Gwen
(deepseek-r1:14b)
|
🏆 Henrik
(qwen2.5:14b)
|
Lydia
(deepseek-r1:14b)
|
Gwen: 8 ·
Henrik: 9
|
Rejected
|
View →
|
| 20 |
🏆 Frank
(qwen2.5:14b)
|
Dr. Amara
(gemma2:9b)
|
Terry
(gemma2:9b)
|
Dr. Amara: 7 ·
Frank: 8
|
Upheld
|
View →
|
| 21 |
Dr. Stefan
(gemma2:9b)
|
🏆 Dr. Amara
(mistral-nemo:12b)
|
Terry
(qwen3:14b)
|
Dr. Amara: 8 ·
Dr. Stefan: 6
|
Rejected
|
View →
|
| 22 |
🏆 Gwen
(llama3.1:8b)
|
Kirsty
(llama3.1:8b)
|
Terry
(qwen3:14b)
|
Kirsty: 7 ·
Gwen: 8
|
Upheld
|
View →
|
| 23 |
🏆 Dr. Stefan
(qwen3:14b)
|
Henrik
(qwen3:14b)
|
Lydia
(gemma3:12b)
|
Dr. Stefan: 9 ·
Henrik: 5
|
Upheld
|
View →
|
| 24 |
Frank
(llama3.1:8b)
|
🏆 Henrik
(llama3.1:8b)
|
Lydia
(qwen3:14b)
|
Henrik: 8 ·
Frank: 6
|
Rejected
|
View →
|
| 25 |
🏆 Frank
(gemma3:12b)
|
Henrik
(deepseek-r1:14b)
|
Lydia
(qwen3:14b)
|
Henrik: 7 ·
Frank: 8
|
Upheld
|
View →
|
| 26 |
Dr. Stefan
(phi4:latest)
|
🏆 Henrik
(qwen3:14b)
|
Terry
(qwen3:14b)
|
Dr. Stefan: 6 ·
Henrik: 8
|
Rejected
|
View →
|
| 27 |
🏆 Gwen
(gemma3:12b)
|
Henrik
(deepseek-r1:14b)
|
Priya
(gemma2:9b)
|
Henrik: 4 ·
Gwen: 9
|
Upheld
|
View →
|
| 28 |
Frank
(gemma2:9b)
|
🏆 Kirsty
(llama3.1:8b)
|
Lydia
(mistral-nemo:12b)
|
Frank: 6 ·
Kirsty: 8
|
Rejected
|
View →
|
| 29 |
🏆 Frank
(mistral-nemo:12b)
|
Dr. Amara
(mistral-nemo:12b)
|
Terry
(gemma3:12b)
|
Frank: 9 ·
Dr. Amara: 4
|
Upheld
|
View →
|
| 31 |
🏆 Gwen
(gemma3:12b)
|
Kirsty
(mistral-nemo:12b)
|
Lydia
(gemma2:9b)
|
Kirsty: 5 ·
Gwen: 8
|
Upheld
|
View →
|
| 32 |
🏆 Gwen
(gemma3:12b)
|
Kirsty
(gemma2:9b)
|
Terry
(deepseek-r1:14b)
|
Kirsty: 7 ·
Gwen: 9
|
Upheld
|
View →
|
| 33 |
Frank
(qwen2.5:14b)
|
🏆 Henrik
(qwen3:14b)
|
Priya
(gemma3:12b)
|
Frank: 4 ·
Henrik: 9
|
Rejected
|
View →
|
32
Runs
19
Premise Upheld
13
Premise Rejected
59%
Uphold Rate
Debater Performance
| Name |
Side |
n |
Wins |
Win% |
Avg score |
| Gwen |
for |
13 |
9 |
69% |
7.6 |
| Frank |
for |
12 |
7 |
58% |
7.3 |
| Kirsty |
against |
10 |
5 |
50% |
7.5 |
| Henrik |
against |
16 |
7 |
44% |
7.1 |
| Dr. Stefan |
for |
7 |
3 |
43% |
7.3 |
| Dr. Amara |
against |
6 |
1 |
17% |
6.7 |
Judge Profile
| Name |
n |
Upheld |
Rejected |
Uphold% |
Bias |
| Lydia |
13 |
8 |
5 |
62% |
+2%
|
| Terry |
11 |
7 |
4 |
64% |
+4%
|
| Priya |
8 |
4 |
4 |
50% |
-9%
|
Speaking Order (n=32)
| Position | Wins | Win% |
| First speaker |
6 |
19% |
| Second speaker |
26 |
81% |
Side Effect (n=32)
| Side | Wins | Win% |
| For |
19 |
59% |
| Against |
13 |
41% |
Model Performance (Debater)
| Model | n | Wins | Win% | Avg score |
| gemma3:12b |
10 |
8 |
80% |
8.1 |
| qwen2.5:14b |
4 |
3 |
75% |
7.0 |
| llama3.1:8b |
10 |
6 |
60% |
7.5 |
| qwen3:14b |
10 |
6 |
60% |
7.1 |
| phi4:latest |
8 |
4 |
50% |
7.6 |
| mistral-nemo:12b |
6 |
2 |
33% |
6.3 |
| gemma2:9b |
9 |
2 |
22% |
7.0 |
| deepseek-r1:14b |
7 |
1 |
14% |
7.0 |
Model Performance (Judge)
| Model | n | Upheld | Rejected | Uphold% | Bias |
| mistral-nemo:12b |
8 |
6 |
2 |
75% |
+16%
|
| qwen3:14b |
8 |
4 |
4 |
50% |
-9%
|
| llama3.1:8b |
5 |
2 |
3 |
40% |
-19%
|
| gemma2:9b |
5 |
4 |
1 |
80% |
+21%
|
| gemma3:12b |
3 |
2 |
1 |
67% |
+7%
|
| deepseek-r1:14b |
2 |
1 |
1 |
50% |
-9%
|
| phi4:latest |
1 |
0 |
1 |
0% |
-59%
|
This page summarises the results of a simulated debate. For each round, two random AI agents
debate a topic; at the end a third agent acts as judge to select a winner. All arguments and
facts presented (with the exception of this paragraph) are AI-generated, potentially untrue,
and do not represent the views of any real person.