Based on https://medium.com/must-know-computer-science/system-design-load-balancing-1c2e7675fc27
1. Concepts
A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. It helps scale horizontally across an ever-increasing number of servers.
Traditionally, load balancers have been used to distribute requests in a horizontally scaled infrastructure cluster, with systems replicated in multiple servers, where a single server can’t have sufficient power to handle all the demand.
Load Balancers also serve the purpose of decoupling clients and services, which is a good practice from a cloud architecture perspective.
2. How does the load balancer work?
Define IP or DNS name for LB: Administrators define one IP address and/or DNS name for a given application, task, or website, to which all requests will come. This IP address or DNS name is the load balancing server.
Add backend pool for LB: The administrator will then enter into the load balancing server the IP addresses of all the actual servers that will be sharing the workload for a given application or task. This pool of available servers is only accessible internally, via the load balancer.
Deploy LB: Finally, your load balancer needs to be deployed — either as a proxy, which sits between your app servers and your users worldwide and accepts all traffic, or as a gateway, which assigns a user to a server once and leaves the interaction alone thereafter.
Redirect requests: Once the load balancing system is in place, all requests to the application come to the load balancer and are redirected according to the administrator’s preferred algorithm.