Full ERROR Message Example
[Stream #9220a784-08ae-11eb-adc1-0242ac120002] Session failed because remote peer 10.10.238.196:7000 has left.
Why does this ERROR occur?
The nodes of a cluster are constantly gossiping. Gossip is an internal communication technique for nodes in a cluster to talk to each other, disseminating member node's state around the cluster. There is gossip communication taking place every second. During each of these rounds the node initiates gossip exchange according to following rules:
- Gossip to random live endpoint (if any)
- Gossip to random unreachable endpoint with certain probability depending on number of unreachable and live nodes
- If the node gossiped to at (1) was not seed, or the number of live nodes is less than number of seeds, gossip to random seed.
This error message appears when either a node has been silent for more time than expected in a stable ring or when an endpoint is being replaced. For both cases the endpoint will be removed from Gossip, starting from subscribers so anything that depends on gossiper state won't get confused. As soon as the endpoint is removed from gossip, the referred error message will be logged confirming this internal action has been taken.
How do you fix this ERROR?
This error may be triggered by a planned endpoint replacement or because of some underlying infrastructure issue. Verify if the referred endpoint is reachable, if not, the issue might be related to a network issue or some hardware/OS failure.