How to design a rate limiter

The Framework proposed in this space (Alex Xu) is applied to propose a design : Getting started - a framework to propose...

Part 1 - Understand the problem and establish design scope

Type of rate limiter ? client-side or server-side ? in a middleware ?	Server-Side or Middleware
Rules of throttle ?	Different sets of throttle rules (IP, user ID, etc.)
Scale of the system ? Small, Medium, big company	Big company
In a distributed environment ?	Yes.
Implemented as a separate service or in application code ?	No matter.
Inform user who are throttled?	Yes.

Additional requirements to consider

limit excessive requests
low latency - not slow down http response time
limit the use of memory as possible
can be shared across servers and processes
exception handling - show clear exceptions to users
high fault tolerance - if rate limiter goes offline, it does not affect the system

Part 2 - High level design

Client-side implementation is not the right option because client is not a reliable place : client requests can be forged by malicious actors.

Where to put the rate limiter ?

Choice of implementation is a middleware : API Gateway is a fully managed service supporting rate limiting, SSL termination, authentication, IP whitelisting, etc.

How To choose the right implementation ?

Middleware vs Server-Side (API Servers) ? Use a guideline :

Evaluate the current technology stack;
Identify the rate limiting algorithm - on server-side, we have full control of algo.
If Microservice Architecture and including an API gateway (performing tasks like authentication, IP whitelisting, etc.)

Regarding the algo, we need a counter to keep track of how many requests are sent from the same user, IP Address, etc.

Introduction

Part 1 - Understand the problem and establish design scope

Additional requirements to consider

Part 2 - High level design

Where to put the rate limiter ?

How To choose the right implementation ?

Part 3 - Design deep dive

Part 4 - Pros & Cons