<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reliability on Werner Strydom</title><link>https://wernerstrydom.com/tags/reliability/</link><description>Recent content in Reliability on Werner Strydom</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 02 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://wernerstrydom.com/tags/reliability/index.xml" rel="self" type="application/rss+xml"/><item><title>What happens if we stop retrying?</title><link>https://wernerstrydom.com/posts/what-happens-if-we-stop-retrying/</link><pubDate>Thu, 02 Jul 2026 00:00:00 +0000</pubDate><guid>https://wernerstrydom.com/posts/what-happens-if-we-stop-retrying/</guid><description>&lt;p>The retry is the most confident line of code we write. Think about what it
claims: &lt;em>the same request, sent again, will produce a different outcome.&lt;/em>
Sometimes that&amp;rsquo;s true — a dropped packet, a node mid-restart. But we don&amp;rsquo;t
retry because we&amp;rsquo;ve established that. We retry because it&amp;rsquo;s easy and it
usually looks like it works.&lt;/p>
&lt;p>So, the thought experiment: what happens if we stop?&lt;/p>
&lt;p>Take a service that&amp;rsquo;s slow because it&amp;rsquo;s overloaded. Callers time out and
retry. Each retry is a brand-new request the service must also fail, which
makes it slower, which causes more timeouts, which causes more retries. We
have a name for this — a retry storm — and yet we keep writing the loop,
because each individual retry looks reasonable. It&amp;rsquo;s the traffic jam problem:
nobody thinks &lt;em>they&amp;rsquo;re&lt;/em> the traffic.&lt;/p></description></item></channel></rss>