Using multi-master clusters is one of the most important aspects of building solid infrastructure. We built our original open-source scheduler with a multi-master setup from the ground up, because we recognize how important it is to provide our customers with failover and high availability. In fact, the entire Containership team has made careers out of building systems that are highly available.
Kubernetes clusters enable developers to deploy and manage entire groups of containers with one single API entry point, scheduler, authentication model and naming scheme. But while single-master clusters can easily fail, multi-master clusters use multiple (usually at least three) master nodes – each of which has access to the same pool of worker nodes – to establish quorum in the case of a loss of one or more members.
Here’s a basic overview of how a multi-master setup works.
Master node components
First and foremost, each master node in a multi-master cluster runs its own copy of the Kubernetes API server. This can be useful for load balancing among the servers running on each of the master nodes. When the second master replica of the cluster is created, a load balancer containing the two replicas is generated, and the IP address of the first replica is promoted to the load balancer’s IP address. On the reverse side, when the second-to-last master replica of the cluster is removed, its IP address will be reassigned to the final cluster replica.
Of course, a master node can also run its own copy of the etcd database, which stores all cluster data (at the very least, each master node needs access to etcd). Another possibility is to run a separate cluster with its own dedicated etcd database, which provides an etcd instance for all the worker nodes in the pool.
In such a configuration, etcd runs as a cluster of odd members, in which the leader of the distributed system sends heartbeats to all its followers on a scheduled timetable, in order to keep the cluster stable. Among the leader and followers, a quorum of nodes is required to agree to updates to the cluster’s state.
While this etcd cluster approach can keep the cluster’s key value highly available, it’s essential to prevent resource starvation (typically due to network and/or disk IO overload) for the etcd cluster. Resource starvation can cause a timeout in heartbeats, which will indicate to the cluster’s nodes that no leader for the quorum is elected. A leaderless cluster will be unable to make changes such as scheduling new pods, which will render the cluster — and this, the etcd instance — unstable.
In addition to the core API server and (possibly) a copy of the etcd database, the master node also runs the Kubernetes controller manager, which handles routine tasks like replication; as well as the scheduler, which tracks newly created pods and assigns nodes to them.
Worker node components
Beneath the master nodes in the multi-master cluster, a pool of worker nodes run their own components, mainly focused on orchestrating pods for the Kubernetes runtime environment. The primary node agent is the Kubelet, which watches for pods assigned to its node, and performs tasks like mounting pod volumes, downloading secrets, and running containers and health checks. Each worker node also runs a Kubernetes proxy, which maintains network rules on the host, and handles connection forwarding.
Advantages of multi-master
In a single-master setup, the master node manages the etcd distributed database, as well as all the Kubernetes master components: the API, controller manager and scheduler, along with a set of worker nodes distributed throughout the availability zone (AZ). However, if that single master node fails, all the worker nodes fail as well, and the entire AZ will be lost.
In a multi-master Kubernetes setup, by contrast, multiple master nodes provide high availability for a cluster, all on a single cloud provider. This improves network performance, because all the master nodes behave like a unified data center. It also significantly expands AZ availability, because instead of using a single master to cover all AZs, each master node can cover a separate AZ, or can step in to handle heavier loads in other AZs, as needed. And it provides a high level of redundancy and failover, in case of the loss of one or more master nodes.
This load balancing and redundancy are crucial, because when a controlling master node fails, the Kubernetes API goes offline, which reduces the cluster to a collection of ad-hoc nodes without centralized management. This means the cluster will be unresponsive to issues like additional node failures, requests to create new resources, or to move pods to different nodes, until the master node is brought back online. While applications will typically continue to function normally during master node downtime, DNS queries may not resolve if a node is rebooted during master node downtime.
Another advantage of a multi-master setup is the flexibility with which it scales while maintaining high availability across multiple AZs. For example, each Kubernetes master can be assigned to an auto-scaling group, preventing the likelihood that an unhealthy instance will be replicated. All that’s necessary for worker nodes is to assign each of them to one of the auto-scaling groups, which increases the “desired” number of worker instances, bringing in the same hosts with the same worker components and configurations onto each master node — and when one worker instance runs out of resources, a fresh one will automatically be brought into the correct AZ.
A multi-master setup protects against a wide range of failure modes, from the loss of a single worker node, all the way up to the loss of a master node’s etcd service, or even a network failure that brings down an entire AZ. By providing redundancy, a multi-master cluster serves as a highly available system for your end users.
Looking to get multi master setup at your organization? The team at Containership can help.