Asynchronous Request-Reply / Queue-Based Load Leveling

Decouple backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response : https://docs.microsoft.com/en-us/azure/architecture/patterns/async-request-reply

  1. Context and problem

    Need a response quickly enough to arrive back over the same connection and avoid latency to the response and fake the appearance of asynchronous processing (recommended for I/O-bound operations) because we use Http polling (useful to client-side code).
    But, the work done by backend may be long-running (seconds, minutes, or even hours). So, Queue-Based Load Leveling could he helpful because the queue decouples the tasks from service and the service can handle the messages at its own pace regardless the volume of requests from concurrent tasks.

    Http response is the key in this kind of pattern : HTTP 202 (Accepted) status code, acknowledging that the request has been received for processing.

  2. Solution

    Asynchronous Request/Reply (ARR)


    If the backend in ARR may be slow (more seconds, more minutes, even hours), then Queue-Based Load Leveling pattern may be the solution.

    Queue-Based Load Leveling

     

     

Messaging pattern including a request/reply pattern : this is the Queue-Based Load Leveling implemented with an asynchronous request-reply pattern.

3. Challenges
3.1) Asynchronous Request/Reply

  • In some scenarios, you might want to provide a way for clients to cancel a long-running request. In that case, the backend service must support some form of cancellation instruction.

  • Return appropriate status code.

  • Implement with legacy clients (i.e implement a facade over the asynchronous API to hide the asynchronous processing from the client. Logic Apps could help for that).

  • Predict the volume of requests to the service at any time.

3.2) Queue-Based Load Leveling

  • Message queues are a one-way communication mechanism. But, a task could expect a reply from service, we need to implement a mechanism used by the service to send a response.

  • If we autoscale services, it may result in increased contention and may diminish the effectiveness of using the queue to level the load.

  • Adjust the number of queues and the number of services instances to facilitate to handle the load.