Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Show a server is available by periodically sending a message to all the other servers : https://martinfowler.com/articles/patterns-of-distributed-systems/heartbeat.html

Problem

When multiple servers form a cluster, the servers are responsible for storing some portion of the data, based on the partitioning and replication schemes used. Timely detection of server failures is important to make sure corrective actions can be taken by making some other server responsible for handling requests for the data on failed servers.

On this page.

Table of Contents

Solution

Image Added

Periodically send a request to all the other servers indicating liveness of the sending server. Select the request interval to be more than the network round trip time between the servers. All the servers wait for the timeout interval, which is multiple of the request interval to check for the heartbeats.

Tip

Timeout Interval > Request Interval > Network round trip time between the servers : It is useful to know the network round trip times within and between datacenters when deciding values for heartbeat interval and timeouts.

e.g. If the network round trip time between the servers is 20ms, the heartbeats can be sent every 100ms, and servers check after 1 second to give enough time for multiple heartbeats to be sent and not get false negatives.