Versions Compared
Version | Old Version 1 | New Version 2 |
---|---|---|
Changes made by | ||
Saved on |
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Info |
---|
Show a server is available by periodically sending a message to all the other servers : https://martinfowler.com/articles/patterns-of-distributed-systems/heartbeat.html |
Problem
When multiple servers form a cluster, the servers are responsible for storing some portion of the data, based on the partitioning and replication schemes used. Timely detection of server failures is important to make sure corrective actions can be taken by making some other server responsible for handling requests for the data on failed servers.
On this page.
Table of Contents |
---|
Solution
Image Added
Periodically send a request to all the other servers indicating liveness of the sending server. Select the request interval to be more than the network round trip time between the servers. All the servers wait for the timeout interval, which is multiple of the request interval to check for the heartbeats.
Tip |
---|
Timeout Interval > Request Interval > Network round trip time between the servers : It is useful to know the network round trip times within and between datacenters when deciding values for heartbeat interval and timeouts. |
e.g. If the network round trip time between the servers is 20ms, the heartbeats can be sent every 100ms, and servers check after 1 second to give enough time for multiple heartbeats to be sent and not get false negatives.