Etcd 数据库数据备份与恢复

我们采用 etcdctl 和 etcdutl 工具进行 etcd 数据库的备份与恢复。

官网下载地址:https://github.com/etcd-io/etcd/releases

1. Etcd 数据库数据备份

1.1. 物理节点裸部署

1.1.1. 二进制安装 etcdctl 和 etcdutl

安装脚本:install_etcdctl.sh

#!/bin/bash

# 安装版本
etcd_ver=v3.5.17
# 安装目录
etcd_dir=/software_path/etcd
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download

# Download
if [ ! -d $etcd_dir ];then
   mkdir -p $etcd_dir
fi
wget ${DOWNLOAD_URL}/${etcd_ver}/etcd-${etcd_ver}-linux-amd64.tar.gz
mv etcd-${etcd_ver}-linux-amd64.tar.gz ${etcd_dir}
cd $etcd_dir
tar -xzvf ${etcd_dir}/etcd-${etcd_ver}-linux-amd64.tar.gz

# Install
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdctl /usr/local/sbin/etcdctl
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdutl /usr/local/sbin/etcdutl
ShellScript

验证安装结果:

$ etcdctl version
etcdctl version: 3.5.17
API version: 3.5

$ etcdutl version
etcdutl version: 3.5.17
API version: 3.5
ShellScript

1.1.2. 使用 etcdctl 备份

如果是单节点 Kubernetes 我们只需要对其的 etcd 数据库进行快照备份, 如果是多主多从的集群,我们则需依次备份多个 master 节点中 etcd,防止在备份时etc数据被更改!

在所有 etcd 数据库节点执行下述命令:

# 把当前节点的 etcd 数据导出为快照
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save etcdbackupfile.db
ShellScript

自动备份脚本:etcd_backup.sh

#!/bin/bash

# 备份目录
backup_dir="/var/lib/etcd_db_bak"

# 时间戳
DATE=`date +"%Y%m%d%H%M"`

# 判断目录是否存在,不在则创建
if [ ! -d $backup_dir ];then
   mkdir $backup_dir
fi

# 执行数据库备份
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save $backup_dir/etcdbackupfile_$DATE.db
ShellScript

设置定时任务:

crontab -e
PATH=/usr/local/sbin
30 18 * * *  /var/lib/etcd_db_bak/etcd_backup.sh
50 23 * * *  find /var/lib/etcd_db_bak/ -mtime +5 -name "*.db" -exec rm -rf {} \;
ShellScript

1.2. Docker 容器部署

1.3. Kubernetes CronJob 部署

# etcd-database-backup.yaml
apiVersion: batch/v1
kind: CronJob 
metadata:
  name: etcd-database-backup
  annotations:
    descript: "etcd数据库定时备份"
spec:
  schedule: "*/5 * * * *"   # 表示每5分钟运行一次
  jobTemplate:
    spec:
      template:
        spec:           
          containers:    
          - name: etcdctl
            image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.5-0
            env:
            - name: ETCDCTL_API
              value: "3"
            - name: ETCDCTL_CACERT
              value: "/etc/kubernetes/pki/etcd/ca.crt"
            - name: ETCDCTL_CERT
              value: "/etc/kubernetes/pki/etcd/healthcheck-client.crt"
            - name: ETCDCTL_KEY
              value: "/etc/kubernetes/pki/etcd/healthcheck-client.key"
            command:
            - /bin/sh 
            - -c
            - |
              export RAND=$RANDOM
              etcdctl --endpoints=https://192.168.12.107:2379 snapshot save /backup/etcd-107-${RAND}-snapshot.db
              etcdctl --endpoints=https://192.168.12.108:2379 snapshot save /backup/etcd-108-${RAND}-snapshot.db
              etcdctl --endpoints=https://192.168.12.109:2379 snapshot save /backup/etcd-109-${RAND}-snapshot.db
            volumeMounts: 
            - name: "pki"
              mountPath: "/etc/kubernetes"
            - name: "backup"
              mountPath: "/backup"
            imagePullPolicy: IfNotPresent
          volumes:
          - name: "pki"
            hostPath: 
              path: "/etc/kubernetes"
              type: "DirectoryOrCreate"
          - name: "backup"
            hostPath: 
              path: "/storage/dev/backup"  # 数据备份目录
              type: "DirectoryOrCreate"
          nodeSelector:  # 将Pod绑定在主节点之中,否则只能将相关证书放在各个节点能访问的nfs共享存储中
            node-role.kubernetes.io/master: ""
          restartPolicy: Never
EOF
YAML

2. Etcd 数据库数据恢复

停掉所有 Master 机器的 kube-apiserver 和 etcd ,然后在利用备份进行恢复该节点的etcd 数据。

# 停掉 kube-apiserver 和 etcd 静态 Pod
mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/

# 在该节点上删除 /var/lib/etcd
mv /var/lib/etcd /var/lib/etcd.bak

mkdir /var/lib/etcd

# 利用快照进行恢复,在多个节点的备份中选择一个最大的依次在多个节点上恢复数据
# 如果采用不同的备恢复数据可能导致 etcd 数据不一致
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-01-c-201 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380  --initial-advertise-peer-urls=https://192.168.2.201:2380

ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-02-r-202 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380  --initial-advertise-peer-urls=https://192.168.2.202:2380

ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-03-u-203 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key  --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380  --initial-advertise-peer-urls=https://192.168.2.203:2380

mv /etc/kubernetes/manifests-backup/ /etc/kubernetes/manifests/
ShellScript

etcdctl 常见命令:

# etcd 集群节点状态查看主从节点
ETCDCTL_API=3 etcdctl endpoint status --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --write-out table

# etcd 集群成员列表
ETCDCTL_API=3 etcdctl member list --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379  --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key

# etcd 集群节点健康信息筛选出不健康的节点
ETCDCTL_API=3 etcdctl endpoint health --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379  --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
ShellScript