我们采用 etcdctl 和 etcdutl 工具进行 etcd 数据库的备份与恢复。
官网下载地址:https://github.com/etcd-io/etcd/releases
1. Etcd 数据库数据备份
1.1. 物理节点裸部署
1.1.1. 二进制安装 etcdctl 和 etcdutl
安装脚本:install_etcdctl.sh
#!/bin/bash
# 安装版本
etcd_ver=v3.5.17
# 安装目录
etcd_dir=/software_path/etcd
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
# Download
if [ ! -d $etcd_dir ];then
mkdir -p $etcd_dir
fi
wget ${DOWNLOAD_URL}/${etcd_ver}/etcd-${etcd_ver}-linux-amd64.tar.gz
mv etcd-${etcd_ver}-linux-amd64.tar.gz ${etcd_dir}
cd $etcd_dir
tar -xzvf ${etcd_dir}/etcd-${etcd_ver}-linux-amd64.tar.gz
# Install
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdctl /usr/local/sbin/etcdctl
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdutl /usr/local/sbin/etcdutl
ShellScript验证安装结果:
$ etcdctl version
etcdctl version: 3.5.17
API version: 3.5
$ etcdutl version
etcdutl version: 3.5.17
API version: 3.5
ShellScript1.1.2. 使用 etcdctl 备份
如果是单节点 Kubernetes 我们只需要对其的 etcd 数据库进行快照备份, 如果是多主多从的集群,我们则需依次备份多个 master 节点中 etcd,防止在备份时etc数据被更改!
在所有 etcd 数据库节点执行下述命令:
# 把当前节点的 etcd 数据导出为快照
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save etcdbackupfile.db
ShellScript自动备份脚本:etcd_backup.sh
#!/bin/bash
# 备份目录
backup_dir="/var/lib/etcd_db_bak"
# 时间戳
DATE=`date +"%Y%m%d%H%M"`
# 判断目录是否存在,不在则创建
if [ ! -d $backup_dir ];then
mkdir $backup_dir
fi
# 执行数据库备份
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save $backup_dir/etcdbackupfile_$DATE.db
ShellScript设置定时任务:
crontab -e
PATH=/usr/local/sbin
30 18 * * * /var/lib/etcd_db_bak/etcd_backup.sh
50 23 * * * find /var/lib/etcd_db_bak/ -mtime +5 -name "*.db" -exec rm -rf {} \;
ShellScript1.2. Docker 容器部署
1.3. Kubernetes CronJob 部署
# etcd-database-backup.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-database-backup
annotations:
descript: "etcd数据库定时备份"
spec:
schedule: "*/5 * * * *" # 表示每5分钟运行一次
jobTemplate:
spec:
template:
spec:
containers:
- name: etcdctl
image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.5-0
env:
- name: ETCDCTL_API
value: "3"
- name: ETCDCTL_CACERT
value: "/etc/kubernetes/pki/etcd/ca.crt"
- name: ETCDCTL_CERT
value: "/etc/kubernetes/pki/etcd/healthcheck-client.crt"
- name: ETCDCTL_KEY
value: "/etc/kubernetes/pki/etcd/healthcheck-client.key"
command:
- /bin/sh
- -c
- |
export RAND=$RANDOM
etcdctl --endpoints=https://192.168.12.107:2379 snapshot save /backup/etcd-107-${RAND}-snapshot.db
etcdctl --endpoints=https://192.168.12.108:2379 snapshot save /backup/etcd-108-${RAND}-snapshot.db
etcdctl --endpoints=https://192.168.12.109:2379 snapshot save /backup/etcd-109-${RAND}-snapshot.db
volumeMounts:
- name: "pki"
mountPath: "/etc/kubernetes"
- name: "backup"
mountPath: "/backup"
imagePullPolicy: IfNotPresent
volumes:
- name: "pki"
hostPath:
path: "/etc/kubernetes"
type: "DirectoryOrCreate"
- name: "backup"
hostPath:
path: "/storage/dev/backup" # 数据备份目录
type: "DirectoryOrCreate"
nodeSelector: # 将Pod绑定在主节点之中,否则只能将相关证书放在各个节点能访问的nfs共享存储中
node-role.kubernetes.io/master: ""
restartPolicy: Never
EOF
YAML2. Etcd 数据库数据恢复
停掉所有 Master 机器的 kube-apiserver 和 etcd ,然后在利用备份进行恢复该节点的etcd 数据。
# 停掉 kube-apiserver 和 etcd 静态 Pod
mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
# 在该节点上删除 /var/lib/etcd
mv /var/lib/etcd /var/lib/etcd.bak
mkdir /var/lib/etcd
# 利用快照进行恢复,在多个节点的备份中选择一个最大的依次在多个节点上恢复数据
# 如果采用不同的备恢复数据可能导致 etcd 数据不一致
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-01-c-201 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380 --initial-advertise-peer-urls=https://192.168.2.201:2380
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-02-r-202 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380 --initial-advertise-peer-urls=https://192.168.2.202:2380
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db --data-dir=/var/lib/etcd --name=k8s-master-03-u-203 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --initial-cluster-token=etcd-cluster-0 --initial-cluster=k8s-master-01-c-201=https://192.168.2.201:2380,k8s-master-02-r-202=https://192.168.2.202:2380,k8s-master-03-u-203=https://192.168.2.203:2380 --initial-advertise-peer-urls=https://192.168.2.203:2380
mv /etc/kubernetes/manifests-backup/ /etc/kubernetes/manifests/
ShellScriptetcdctl 常见命令:
# etcd 集群节点状态查看主从节点
ETCDCTL_API=3 etcdctl endpoint status --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --write-out table
# etcd 集群成员列表
ETCDCTL_API=3 etcdctl member list --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
# etcd 集群节点健康信息筛选出不健康的节点
ETCDCTL_API=3 etcdctl endpoint health --endpoints=https://192.168.2.201:2379 --endpoints=https://192.168.2.202:2379 --endpoints=https://192.168.2.203:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
ShellScript