搭建Kubernetes高可用集群

在上一节,我们介绍了Kubernetes集群的搭建,我们说这是一个“准生产”级别的集群。

原因是,他不支持高可用。

设想下,假设Master节点挂掉,会出现什么情况?

由于只有一个主节点,所以集群会直接瘫痪。

本节,我们将借助KeepAlived搭建一个高可用的集群。

我们需要4台机器(物理机 or 虚拟机均可)。假设,这4台机器的IP分别为:

  • h1:192.168.1.12

  • h2:192.168.1.10

  • h3:192.168.1.9

  • h4:192.168.1.16

同时我们需要一个不冲突的VIP(Virtual IP),当发生主备切换时,KeepAlive会让VIP从主Master切换到备Master上。

注意,如果你使用云主机,由于网络安全性的原因,是无法自由使用云主机的,需要单独HAVIP(高可用VIP),申请地址如下:[腾讯云](腾讯云运营活动 - 腾讯云),[阿里云](阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台)。

这里假设你已经有了可用的VIP,其地址为192.168.1.8。

1 部署KeepAlived

这里我们选用h1、h2做为Master节点的主机和备机。

则需要在这两台机器上安装keepalived

yum install -y keepalived

两台机器的配置文件分别如下:

h1:

! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL 
}
vrrp_script check_apiserver {
    script "</dev/tcp/127.0.0.1/6443"
    interval 1
    weight -2
}
vrrp_instance VI-kube-master {
    state MASTER     # 定义节点角色
    interface eth0   # 网卡名称
    virtual_router_id 68
    priority 100
    dont_track_primary
    advert_int 3
    authentication {
        auth_type PASS
        auth_pass mypass
    }
    unicast_src_ip 192.168.1.12  #当前ECS的ip
    unicast_peer {
        192.168.1.10             #对端ECS的ip
    }
    virtual_ipaddress {
         192.168.1.8 # havip
   }
   track_script {
         check_apiserver
   }
}

h2:

! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL 
}
vrrp_script check_apiserver {
    script "</dev/tcp/127.0.0.1/6443"
    interval 1
    weight -2
}
vrrp_instance VI-kube-master {
    state BACKUP     # 定义节点角色
    interface eth0   # 网卡名称
    virtual_router_id 68
    priority 99
    dont_track_primary
    advert_int 3
    unicast_src_ip 192.168.1.10  #当前ECS的ip
    authentication {
        auth_type PASS
        auth_pass mypass
    }
    unicast_peer {
        192.168.1.12             #对端ECS的ip
    }
    virtual_ipaddress {
         192.168.1.8 # havip
    }
    track_script {
         check_apiserver
    }
}

解释如下:

  • h1做为主机,state是MASTER,h2备机,状态为BACKUP

  • h1和h2通过unicast方式发现,互相设置了unicast_peer为对方的IP

  • virtual_ipaddress中设置了相同的VIP地址

  • 检查是否可用使用了check_apiserver这个方法,他会检查TCP端口的6443是否开启。这实际是Kubernetes的API Server地址。

配置完成后,记得重启两台机器的keepalived服务。

systemctl enable keepalived
service keepalived start

2 准备Kubernetes环境

这里与上一节的准备工作完全一致,不再赘述。

请参考《搭建Kubernetes集群》一节中的步骤2~4。

注意这里是4台机器都要安装。

3 启动主节点

我们首先在h1上操作,命令如下:

kubeadm init --kubernetes-version v1.22.1 --control-plane-endpoint=192.168.1.8:6443  --apiserver-advertise-address=192.168.1.8 --pod-network-cidr=10.6.0.0/16 --upload-certs

说明如下:

  • 这里的control-plane-endpoint / apiserver-advertise-address填写的是VIP地址,会被VIP转发流量到h1 or h2上(取决于谁的状态是MASTER)

  • upload-certs:自动上传证书,高可用集群需要

执行成功后,结果如下:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
  --control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d

上述有两个join命令,长的那个是master用的,短的是slave用的。

我们将h2和h3也以master方式加入(因为Kubernetes要求至少有两个Master存活,才能正常工作),也即在h2和h3上执行:

kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d \
  --control-plane --certificate-key 23474fd4262f1bf8849c5cea160fd3309621f79460266c43dfca1d7cc390f1af

4 启动普通节点

在h4上以slave身份加入

kubeadm join 192.168.1.8:6443 --token ydkjeh.zu9qthjssivlyrqy \
  --discovery-token-ca-cert-hash sha256:87d31b2fb17002f23dce01054c4877b133c15e3a1ed639e8f63b247f61609f8d

5 安装网络插件

回到h1 or h2 or h3上执行(因为他们三个都是Master):

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# 修改cidr匹配后
kubectl apply -f ./kube-flannel.yml

6 测试高可用

我们对h1执行关机

poweroff

然后查看h2上的keepalived日志,可以观察到切换:

9月 18 7:59:28 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Changing effective priority from 97 to 99
9月 18 8:03:22 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Transition to MASTER STATE
9月 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) Entering MASTER STATE
9月 18 8:03:25 h2 Keepalived_vrrp[18653]: VRRP_Instance(VI-kube-master) setting protocol VIPs.

然后立即在h2上查看集群状态,全部正常:

kubectl get nodes
NAME   STATUS     ROLES                  AGE     VERSION
h1     Ready   control-plane,master   6m16s   v1.22.2
h2     Ready      control-plane,master   5m51s   v1.22.2
h3     Ready      control-plane,master   4m52s   v1.22.2
h4     Ready      <none>                 3m38s   v1.22.2

再等一会后,发现h1挂掉了:

kubectl get nodes
NAME   STATUS     ROLES                  AGE     VERSION
h1     NotReady   control-plane,master   6m16s   v1.22.2
h2     Ready      control-plane,master   5m51s   v1.22.2
h3     Ready      control-plane,master   4m52s   v1.22.2
h4     Ready      <none>                 3m38s   v1.22.2

至此,我们实现了Master的高可用!

7 测试高可用恢复

我们重启启动h1,稍等一会,发现一切正常!

kubectl get nodes
NAME   STATUS   ROLES                  AGE     VERSION
h1     Ready    control-plane,master   8m14s   v1.22.2
h2     Ready    control-plane,master   7m49s   v1.22.2
h3     Ready    control-plane,master   6m50s   v1.22.2
h4     Ready    <none>                 5m36s   v1.22.2

至此,你应该已经熟悉了Kubernetes集群高可用的搭建步骤。

这里提一个问题:我们将h1、h2、h3都是Master,但是只在h1和h2上设置了KeepAlived。

  • 如果h3挂掉后,集群能正常工作么?

  • 如果h3挂掉后,h2也挂掉了,集群还能正常工作么?