Why must a Kubernetes cluster have an odd number of nodes

If you’ve spent any time setting up or managing Kubernetes, you might have come across the recommendation that clusters should have an odd number of nodes. But why is that? Let’s break it down in simple terms.
It’s All About Leader Election

Kubernetes…


This content originally appeared on DEV Community and was authored by Farshad Nickfetrat

If you’ve spent any time setting up or managing Kubernetes, you might have come across the recommendation that clusters should have an odd number of nodes. But why is that? Let's break it down in simple terms.
It's All About Leader Election

Kubernetes relies on ETCD and ETCD uses RAFT Algorithm that is a consensus algorithm (Paxos)
What is RAFT consensus

RAFT is a consensus algorithm used to ensure multiple computers (or nodes) agree on shared data, even if some nodes fail. It's designed to be easier to understand than other algorithms like Paxos.

RAFT ensures that distributed systems like etcd (used by Kubernetes) can agree on a single leader and maintain consistency, even when some nodes fail. Raft is designed to be understandable.

Imagine a group of people trying to agree on a decision (like which movie to watch). RAFT works by choosing one person as the leader, who suggests a movie (or decision). The others (followers) can agree with the leader or ask for changes. If the leader goes away (fails), the group elects a new leader. As long as a majority agree, the group can keep making decisions, even if some people (nodes) aren't available.

Image description

Take a look at these examples :

  1. 4-Node System

Total Nodes: 4Quorum Required: 3Allowed Failed Nodes: 1

Quorum Required: To maintain consensus in this 4-node system, a majority (3 nodes) must be operational.
Allowed Failed Nodes: This system can tolerate the failure of only 1 node. If 2 nodes go down, the system loses quorum and cannot make decisions.
Scenario: If 3 nodes are up and 1 is down, the system can still function. If 2 nodes go down, the system cannot process any new transactions until at least 3 nodes are up again.

  1. 9-Node System

Total Nodes: 9Quorum Required: 5Allowed Failed Nodes: 4

Quorum Required: In this 9-node system, at least 5 nodes (more than half) must be up and running to reach a consensus.

Allowed Failed Nodes: This system can tolerate up to 4 node failures while still maintaining quorum.
Scenario: If 4 nodes fail, the remaining 5 nodes can continue to operate and maintain consensus. However, if a 5th node fails, the system loses quorum and can no longer process updates or transactions.

  1. 10-Node System

Total Nodes: 10Quorum Required: 6Allowed Failed Nodes: 4

Quorum Required: In a 10-node system, at least 6 nodes must be operational to maintain consensus.
Allowed Failed Nodes: This system can tolerate the failure of up to 4 nodes. If 5 nodes go down, the system loses quorum.

Scenario: If 6 nodes are operational, the system can process transactions and make decisions. However, if 5 nodes are down, the system becomes inoperable because there are not enough nodes to reach quorum.

  1. 11-Node System

Total Nodes: 11Quorum Required: 6Allowed Failed Nodes: 5
Explanation:

Quorum Required: For an 11-node system, a minimum of 6 nodes need to be up to form a quorum.
Allowed Failed Nodes: This system can tolerate up to 5 node failures. If 6 nodes fail, quorum is lost.
Scenario: With 6 nodes operational, the system can still reach consensus and process transactions. However, if the 6th node fails, the system is effectively halted since it no longer has a quorum to make decisions.

Image description

Image description

Conclusion

By increasing the number of control plane nodes, you raise the failure tolerance of the cluster. However, it's crucial to have an odd number of nodes to simplify quorum (majority) calculations and avoid split-brain scenarios. This ensures the cluster can make decisions efficiently and remain stable even during failures.

About Author :
Hi 👋, I’m Farshad Nick (Farshad nickfetrat)

📝 I regularly write articles on packops.dev and packops.ir
💬 Ask me about Devops , Cloud , Kubernetes , Linux
📫 How to reach me on my linkedin
Here is my Github repo


This content originally appeared on DEV Community and was authored by Farshad Nickfetrat


Print Share Comment Cite Upload Translate Updates
APA

Farshad Nickfetrat | Sciencx (2024-09-22T10:41:27+00:00) Why must a Kubernetes cluster have an odd number of nodes. Retrieved from https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/

MLA
" » Why must a Kubernetes cluster have an odd number of nodes." Farshad Nickfetrat | Sciencx - Sunday September 22, 2024, https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/
HARVARD
Farshad Nickfetrat | Sciencx Sunday September 22, 2024 » Why must a Kubernetes cluster have an odd number of nodes., viewed ,<https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/>
VANCOUVER
Farshad Nickfetrat | Sciencx - » Why must a Kubernetes cluster have an odd number of nodes. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/
CHICAGO
" » Why must a Kubernetes cluster have an odd number of nodes." Farshad Nickfetrat | Sciencx - Accessed . https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/
IEEE
" » Why must a Kubernetes cluster have an odd number of nodes." Farshad Nickfetrat | Sciencx [Online]. Available: https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/. [Accessed: ]
rf:citation
» Why must a Kubernetes cluster have an odd number of nodes | Farshad Nickfetrat | Sciencx | https://www.scien.cx/2024/09/22/why-must-a-kubernetes-cluster-have-an-odd-number-of-nodes/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.