This is weird to say out loud, but I actually am kinda an expert in rate limiting, so I'm gonna explain some stuff.
About half of incidents in large-scale production systems involve having more requests than you can serve. There are two categories of this kind of incident:
Maggie Johnson-Pint
19.3K posts
Dog Person. DateTime weirdo. These days I work on planes.
Forever ❤️JS.
She/her
@maggie.bsky.social
@[email protected]
Woodinville, WA
Joined July 2014
- Replying to @maggiepint1. Top-Down overload or "Reddit Hug of Death": This is what Bluesky experienced today - suddenly there was a HUGE demand surge and the servers just *couldn't* for a while. This also happens after superbowl ads or when pop stars announce tours or during DDOS attacks.
- Replying to @maggiepint2. Bottom-up: This is the less obvious and more common scenario, when something inside the system fails, that makes the system unable to serve normal load. If you lose a redis cache and everything is reading to DB, you will drastically reduce your ability to serve requests.
- Replying to @maggiepintI don't know what happened at Twitter today, but I don't think Elon woke up and decided to shut it all down - my bet is some 'bottom up' problem (but not necessarily the DDOSd yourself problem everyone is tweeting about - that could be an effect of getting limited, not the cause)
- Replying to @maggiepintAnyways, hope this was informative to someone somewhere because it took a while to write 😂.
- Replying to @maggiepintThe best rate limiters are 'adaptive', and can change rate limits based on system stress, priority of requests, and other things. Twitter has a really good one because they had a really exceptional infra team until a year ago.
- Replying to @maggiepintSimilarly, if a database replica, cloud region, or cluster goes down, you will be in a really tough spot for serving normal workload. And of course if a developer on one service writes code that suddenly slams another service, that's "DDOSing Yourself" and is also bottom-up.
- My husband quit tech and ran a home improvement business for about a year. He was actually pretty good at it, had more business than he could take. He went back to tech. Turns out it's unending 12 hour days and body pain for 1/3 the money. For the farmer folks.
- Replying to @ask_aubryHe's gonna have a BIG surprise when he finds out the courts in pretty much all states won't let you take your kids out of state in divorce situations for basically any reason besides physical abuse.
- Replying to @maggiepintEven if they don't crash, requests stack up waiting for completion - this is called 'backup' - which is what causes the slowness in the requests that do work. Backups have this bad effect of causing users to refresh the page, causing more requests and... more backups.
- Replying to @maggiepintIn these scenarios, the rate limiter is the only thing standing between you and death - because of course if computers get hit with more requests than they can deal with eventually they OOM and crash.
- A lot of things you think are best practice are actually just your opinion.Offend a programmer with a single tweet
- Replying to @maggiepintMy hypothesis - Twitter lost a big part of a critical back end system - maybe they stopped paying their GCP bill, maybe they lost a critical cache and everything was reading other data, I truly do not know.
- Replying to @maggiepintAnother: "I'm a product developer - why do I care about an infra problem?" 1. if you handle this in code you can do something other than give your users 'error' 2. If you handle this in client code, you can save the entire infrastructure by never sending. Literal hero shit.


