Problem
Load on a cloud application typically varies over time based on the number of active users or the types of activities they're performing.
Autoscalling can trigger the provisioning of more resources, but it is not immediate.
Solution
An alternative strategy to autoscaling is to allow applications to use resources only up to a limit, and then throttle them when this limit is reached.
There are different strategies to implement : Priority Queue pattern (using a priority queuing), External Configuration Store pattern (using capability to change config at runtime without nedd for a redployment), etc.
Throttling In Practice
Similar to Retry Pattern : Retry Pattern - Design Patterns & Architecture - DataObjectException KB (atlassian.net)
Example at Azure API Management - Throttling (hovermind.com) - Throttling Config
<rate-limit-by-key calls="10" renewal-period="60" counter-key="@(context.Request.IpAddress)" /> <quota-by-key calls="1000000" bandwidth="10000" renewal-period="2629800" counter-key="@(context.Request.IpAddress)" />