搭建Kubernetes高可用集群
在上一节,我们介绍了Kubernetes集群的搭建,我们说这是一个“准生产”级别的集群。
原因是,他不支持高可用。
设想下,假设Master节点挂掉,会出现什么情况?
由于只有一个主节点,所以集群会直接瘫痪。
本节,我们将借助KeepAlived搭建一个高可用的集群。
我们需要4台机器(物理机 or 虚拟机均可)。假设,这4台机器的IP分别为:
-
h1:192.168.1.12
-
h2:192.168.1.10
-
h3:192.168.1.9
-
h4:192.168.1.16
同时我们需要一个不冲突的VIP(Virtual IP),当发生主备切换时,KeepAlive会让VIP从主Master切换到备Master上。
注意,如果你使用云主机,由于网络安全性的原因,是无法自由使用云主机的,需要单独HAVIP(高可用VIP),申请地址如下:[腾讯云](腾讯云运营活动 - 腾讯云),[阿里云](阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台)。
这里假设你已经有了可用的VIP,其地址为192.168.1.8。
1 部署KeepAlived
这里我们选用h1、h2做为Master节点的主机和备机。
则需要在这两台机器上安装keepalived
yum install -y keepalived
两台机器的配置文件分别如下:
h1:
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "</dev/tcp/127.0.0.1/6443"
interval 1
weight -2
}
vrrp_instance VI-kube-master {
state MASTER # 定义节点角色
interface eth0 # 网卡名称
virtual_router_id 68
priority 100
dont_track_primary
advert_int 3
authentication {
auth_type PASS
auth_pass mypass
}
unicast_src_ip 192.168.1.12 #当前ECS的ip
unicast_peer {
192.168.1.10 #对端ECS的ip
}
virtual_ipaddress {
192.168.1.8 # havip
}
track_script {
check_apiserver
}
}
h2:
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "</dev/tcp/127.0.0.1/6443"
interval 1
weight -2
}
vrrp_instance VI-kube-master {
state BACKUP # 定义节点角色
interface eth0 # 网卡名称
virtual_router_id 68
priority 99
dont_track_primary
advert_int 3
unicast_src_ip 192.168.1.10 #当前ECS的ip
authentication {
auth_type PASS
auth_pass mypass
}
unicast_peer {
192.168.1.12 #对端ECS的ip
}
virtual_ipaddress {
192.168.1.8 # havip
}
track_script {
check_apiserver
}
}
解释如下:
-
h1做为主机,state是MASTER,h2备机,状态为BACKUP
-
h1和h2通过unicast方式发现,互相设置了unicast_peer为对方的IP
-
virtual_ipaddress中设置了相同的VIP地址
-
检查是否可用使用了check_apiserver这个方法,他会检查TCP端口的6443是否开启。这实际是Kubernetes的API Server地址。
配置完成后,记得重启两台机器的keepalived服务。
systemctl enable keepalived
service keepalived start
2 准备Kubernetes环境
这里与上一节的准备工作完全一致,不再赘述。
请参考《搭建Kubernetes集群》一节中的步骤2~4。
注意这里是4台机器都要安装。
3 启动主节点
我们首先在h1上操作,命令如下:
kubeadm init --kubernetes-version v1.22.1 --control-plane-endpoint=192.168.1.8:6443 --apiserver-advertise-address=192.168.1.8 --pod-network-cidr=10.6.0.0/16 --upload-certs
说明如下:
-
这里的control-plane-endpoint / apiserver-advertise-address填写的是VIP地址,会被VIP转发流量到h1 or h2上(取决于谁的状态是MASTER)
-
upload-certs:自动上传证书,高可用集群需要
执行成功后,结果如下:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
--discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
--control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
--discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d
上述有两个join命令,长的那个是master用的,短的是slave用的。
我们将h2和h3也以master方式加入(因为Kubernetes要求至少有两个Master存活,才能正常工作),也即在h2和h3上执行:
kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
--discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
--control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af
4 启动普通节点
在h4上以slave身份加入
kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
--discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d
5 安装网络插件
回到h1 or h2 or h3上执行(因为他们三个都是Master):
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 修改cidr匹配后
kubectl apply -f ./kube-flannel.yml
6 测试高可用
我们对h1执行关机
poweroff
然后查看h2上的keepalived日志,可以观察到切换:
9月 18 7:59:28 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Changing effective priority from 97 to 99
9月 18 8:03:22 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Transition to MASTER STATE
9月 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Entering MASTER STATE
9月 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) setting protocol VIPs.
然后立即在h2上查看集群状态,全部正常:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
h1 Ready control-plane,master 6m16s v1.22.2
h2 Ready control-plane,master 5m51s v1.22.2
h3 Ready control-plane,master 4m52s v1.22.2
h4 Ready <none> 3m38s v1.22.2
再等一会后,发现h1挂掉了:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
h1 NotReady control-plane,master 6m16s v1.22.2
h2 Ready control-plane,master 5m51s v1.22.2
h3 Ready control-plane,master 4m52s v1.22.2
h4 Ready <none> 3m38s v1.22.2
至此,我们实现了Master的高可用!
7 测试高可用恢复
我们重启启动h1,稍等一会,发现一切正常!
kubectl get nodes
NAME STATUS ROLES AGE VERSION
h1 Ready control-plane,master 8m14s v1.22.2
h2 Ready control-plane,master 7m49s v1.22.2
h3 Ready control-plane,master 6m50s v1.22.2
h4 Ready <none> 5m36s v1.22.2
至此,你应该已经熟悉了Kubernetes集群高可用的搭建步骤。
这里提一个问题:我们将h1、h2、h3都是Master,但是只在h1和h2上设置了KeepAlived。
-
如果h3挂掉后,集群能正常工作么?
-
如果h3挂掉后,h2也挂掉了,集群还能正常工作么?