tag: agentic-systems · research.gradstudent.me

read 6 min

Five Agentic-LLM Failure Modes That Aren't Actually LLM Problems

When an agentic LLM does the wrong thing in production, the instinct is to rewrite the prompt. Most of the time the actual fix lives somewhere else, in the tool API, the dispatcher, or the lookup database. Five real failure modes from a Discord bot controlling Old School RuneScape, and the layer each one actually lives in.

read 13 min

Why qwen2.5:14b Pretends to Execute Commands, and What Actually Fixed It

When you run a tool-using LLM, "faking" is when it tells you it did something without ever calling the tool. A Discord bot doing this on 14 of 38 commands went to 16 of 16 after three structural changes. Adding "THIS IS LIVE" to the system prompt did nothing; the prompt was never the problem.

read 7 min

How a Self-Referential Field Cost $26 in Two Hours of Autonomous Gameplay

Autonomous agents on metered LLM APIs can burn real money if their prompts grow without anyone watching. My two-hour OSRS-bot session billed $26 because one Python field was both read and overwritten in the same operation; the user message grew to 82,000 chars before I noticed. Token tracking returned zero the whole time, so the cost stayed invisible until the invoice arrived.