Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 2 Next »

Show a server is available by periodically sending a message to all the other servers : https://martinfowler.com/articles/patterns-of-distributed-systems/heartbeat.html

Problem

When multiple servers form a cluster, the servers are responsible for storing some portion of the data, based on the partitioning and replication schemes used. Timely detection of server failures is important to make sure corrective actions can be taken by making some other server responsible for handling requests for the data on failed servers.

On this page.

Solution

Periodically send a request to all the other servers indicating liveness of the sending server. Select the request interval to be more than the network round trip time between the servers. All the servers wait for the timeout interval, which is multiple of the request interval to check for the heartbeats.

Timeout Interval > Request Interval > Network round trip time between the servers : It is useful to know the network round trip times within and between datacenters when deciding values for heartbeat interval and timeouts.

e.g. If the network round trip time between the servers is 20ms, the heartbeats can be sent every 100ms, and servers check after 1 second to give enough time for multiple heartbeats to be sent and not get false negatives.

  • No labels