research notes
research.gradstudent.me
Notes on LLM measurement infrastructure and experimental design.
-
How I Ran 31,638 LLM Responses to Score Reasoning Mode
The harness behind 283 system prompts × 60 trials of Qwen 2.5 14B playing Prisoner's Dilemma — and two methodological holes that almost made it through.