Why retries make everything worse
Retries are apologies that compound.
The Instinct
Something fails. Retry it. This is the most natural thing in the world. It's also often the worst thing you can do.
The Math
If your service is at 90% capacity and requests start failing, retries don't help—they push you to 180% attempted load. Now everything fails. Now everything retries. Now you're at 360%.
The Cascade
We had a downstream service go slow. Not down, just slow. P99 went from 200ms to 2 seconds. Our retry logic kicked in. Three retries with exponential backoff. Except exponential backoff doesn't help when you're already holding connections.
The Recovery
Circuit breakers. Retry budgets. Deadline propagation. None of these are as satisfying as "just try again." All of them actually work.
The Rule
Every retry is a decision to make the problem worse before checking if it's already getting better.