Throttling Pattern

Details in Throttling pattern - Azure Architecture Center | Microsoft Learn

Problem

Load on a cloud application typically varies over time based on the number of active users or the types of activities they're performing.
Autoscalling can trigger the provisioning of more resources, but it is not immediate.

Solution

An alternative strategy to autoscaling is to allow applications to use resources only up to a limit, and then throttle them when this limit is reached.

There are different strategies to implement : Priority Queue pattern (using a priority queuing), External Configuration Store pattern (using capability to change config at runtime without nedd for a redployment), etc.

Throttling In Practice

Example at Azure API Management - Throttling (hovermind.com) - Throttling Config

<rate-limit-by-key  calls="10"
          renewal-period="60"
          counter-key="@(context.Request.IpAddress)" />

<quota-by-key calls="1000000"
          bandwidth="10000"
          renewal-period="2629800"
          counter-key="@(context.Request.IpAddress)" />