tag
experimental-design
1 entry under this tag.
-
How I Ran 31,638 LLM Responses to Score Reasoning Mode
The harness behind 283 system prompts × 60 trials of Qwen 2.5 14B playing Prisoner's Dilemma — and two methodological holes that almost made it through.