Your fleet evolves. Your infrastructure assumptions don't.
Models get updated. Traffic patterns shift. Token compression reduces context length. Customer load grows. The GPU you correctly sized six months ago may be wrong today — but nothing in your stack will tell you until OOM events start or your throughput collapses.
The result: Tier misplacement that could have been caught at deploy time becomes months of avoidable GPU spend, or a 3am incident.