The goal is not to build a workflow engine inside Stage0. The goal is to stop unsafe repetition before your runtime burns budget, quota, or downstream capacity.
When retries, tool loops, or self-correction chains lose control, the result is rarely clever autonomy. It is usually wasted spend, stuck workflows, or accidental side effects.
The agent keeps calling the same tool because the state never meaningfully changes, but nothing hard-stops the loop.
A failure path becomes a cost path when retries multiply across network errors, model retries, and downstream backoffs.
The run keeps going long after it should have been escalated to a human or terminated with a clear reason.
Bound how many times a run can try again before it must stop or ask for intervention.
max_iterations: 5 -> DEFER on iteration 6Stop a run that has exceeded the window where automated continuation is still acceptable.
timeout_seconds: 120 -> DEFER after 120 secondsTreat repeated calls to the same tool as a signal that the agent is stuck, not making progress.
same_tool_limit: 2 -> DEFER on the third identical callTrack cumulative estimated cost so a bad loop does not stay cheap long enough to be ignored.
max_cost_usd: 2.00 -> DEFER after the threshold is crossed{
"goal": "Resolve a failed payout by retrying the payout tool",
"tools": ["payments.retry_payout"],
"constraints": [
"max_iterations: 5",
"timeout_seconds: 120",
"max_cost_usd: 2.00",
"same_tool_limit: 2"
],
"context": {
"run_id": "payout-retry-8841",
"current_iteration": 6,
"elapsed_seconds": 147,
"recent_tools": [
"payments.retry_payout",
"payments.retry_payout"
],
"cumulative_cost_usd": 2.34
},
"side_effects": ["money_movement", "retry"]
}This request should return DEFER because the run has exceeded its iteration, time, and estimated cost limits. You can provide counters directly, or keep a stable run_id and let Stage0 evaluate loop state across checks.
run_id matters
A stable run identifier lets the control layer and your runtime talk about the same loop across multiple checks.
Decision is not execution
Stage0 can say stop, defer, or continue. Your agent runtime still needs to honor that decision before the next tool call.
Human escalation should be explicit
When the run is stuck, the clean outcome is usually defer to review, not silent continuation.
Buyers care less about an abstract anti-loop claim and more about whether they can bound retries, trace the run, and prove why the next call was stopped.