我们采用 etcdctl 和 etcdutl 工具进行 etcd 数据库的备份与恢复。
官网下载地址:https://github.com/etcd-io/etcd/releases
1. Etcd 数据库数据备份
1.1. 物理节点裸部署
1.1.1. 二进制安装 etcdctl 和 etcdutl
安装脚本:install_etcdctl.sh
#!/bin/bash
# 安装版本
etcd_ver=v3.5.17
# 安装目录
etcd_dir=/software_path/etcd
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
# Download
if [ ! -d $etcd_dir ];then
mkdir -p $etcd_dir
fi
wget ${DOWNLOAD_URL}/${etcd_ver}/etcd-${etcd_ver}-linux-amd64.tar.gz
mv etcd-${etcd_ver}-linux-amd64.tar.gz ${etcd_dir}
cd $etcd_dir
tar -xzvf ${etcd_dir}/etcd-${etcd_ver}-linux-amd64.tar.gz
rm -rf ${etcd_dir}/etcd-${etcd_ver}-linux-amd64.tar.gz
# Install
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdctl /usr/local/sbin/etcdctl
ln -s ${etcd_dir}/etcd-${etcd_ver}-linux-amd64/etcdutl /usr/local/sbin/etcdutl
Bash验证安装结果:
$ etcdctl version
etcdctl version: 3.5.17
API version: 3.5
$ etcdutl version
etcdutl version: 3.5.17
API version: 3.5
Bash1.1.2. 使用 etcdctl 备份
如果是单节点 Kubernetes 我们只需要对其 etcd 数据库进行快照备份,如果是多主多从的集群,我们则需依次备份多个 master 节点中 etcd,防止在备份时etc数据被更改,没有同步到其他节点的情况!
在所有 etcd 数据库节点执行下述命令:
# 把当前节点的 etcd 数据导出为快照
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save etcdbackupfile.db
Bash自动备份脚本:etcd_backup.sh
#!/bin/bash
# 备份目录
backup_dir="/var/lib/etcd_db_bak"
# 时间戳
DATE=`date +"%Y%m%d%H%M"`
# 判断目录是否存在,不在则创建
if [ ! -d $backup_dir ];then
mkdir $backup_dir
fi
# 执行数据库备份
ETCDCTL_API=3 /usr/local/sbin/etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save $backup_dir/etcdbackupfile_$DATE.db
Bash设置定时任务:
crontab -e
30 18 * * * /var/lib/etcd_db_bak/etcd_backup.sh
50 23 * * * find /var/lib/etcd_db_bak/ -mtime +5 -name "*.db" -exec rm -rf {} \;
Bash1.2. Docker 容器部署
1.3. Kubernetes CronJob 部署
# etcd-database-backup.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-database-backup
annotations:
descript: "etcd数据库定时备份"
spec:
schedule: "*/5 * * * *" # 表示每5分钟运行一次
jobTemplate:
spec:
template:
spec:
containers:
- name: etcdctl
image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.5-0
env:
- name: ETCDCTL_API
value: "3"
- name: ETCDCTL_CACERT
value: "/etc/kubernetes/pki/etcd/ca.crt"
- name: ETCDCTL_CERT
value: "/etc/kubernetes/pki/etcd/healthcheck-client.crt"
- name: ETCDCTL_KEY
value: "/etc/kubernetes/pki/etcd/healthcheck-client.key"
command:
- /bin/sh
- -c
- |
export RAND=$RANDOM
etcdctl --endpoints=https://192.168.12.107:2379 snapshot save /backup/etcd-107-${RAND}-snapshot.db
etcdctl --endpoints=https://192.168.12.108:2379 snapshot save /backup/etcd-108-${RAND}-snapshot.db
etcdctl --endpoints=https://192.168.12.109:2379 snapshot save /backup/etcd-109-${RAND}-snapshot.db
volumeMounts:
- name: "pki"
mountPath: "/etc/kubernetes"
- name: "backup"
mountPath: "/backup"
imagePullPolicy: IfNotPresent
volumes:
- name: "pki"
hostPath:
path: "/etc/kubernetes"
type: "DirectoryOrCreate"
- name: "backup"
hostPath:
path: "/storage/dev/backup" # 数据备份目录
type: "DirectoryOrCreate"
nodeSelector: # 将Pod绑定在主节点之中,否则只能将相关证书放在各个节点能访问的nfs共享存储中
node-role.kubernetes.io/master: ""
restartPolicy: Never
EOF
YAML2. Etcd 数据库数据恢复
停掉所有 Master 机器的 kube-apiserver 和 etcd ,然后在利用备份进行恢复该节点的etcd 数据。
# 停掉 kube-apiserver 和 etcd 静态 Pod
mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests-backup/
# 在该节点上删除 /var/lib/etcd
mv /var/lib/etcd /var/lib/etcd.bak
mkdir /var/lib/etcd
# 利用快照进行恢复,在多个节点的备份中选择一个最大的依次在多个节点上恢复数据
# 如果采用不同的备恢复数据可能导致 etcd 数据不一致
# k8s-master-01 节点
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db \
--data-dir=/var/lib/etcd \
--name=k8s-master-01 \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=k8s-master-01=https://192.168.2.151:2380,k8s-master-02=https://192.168.2.152:2380,k8s-master-03=https://192.168.2.153:2380 \
--initial-advertise-peer-urls=https://192.168.2.151:2380
# 参数说明
--data-dir=/var/lib/etcd # 恢复后的数据存储目录
--name=k8s-master-01 # 当前 etcd 节点的名称(需与 --initial-cluster 中的名称一致)。可以在 etcd 静态 pod yaml 文件中查看,默认是主机名。
--cert & --key # 指定 TLS 证书和私钥,用于安全通信(通常位于 /etc/kubernetes/pki/etcd/)。
--initial-cluster-token=etcd-cluster-0 # 自定义集群的唯一标识符,防止新节点意外加入已有集群。在恢复 etcd 集群其他节点时保持一致。
--initial-cluster # 集群成员列表
--initial-advertise-peer-urls # 当前节点的 peer 通信地址,必须与 --initial-cluster 中该节点的 URL 一致。
# k8s-master-02 节点
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db \
--data-dir=/var/lib/etcd \
--name=k8s-master-02 \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=k8s-master-01=https://192.168.2.151:2380,k8s-master-02=https://192.168.2.152:2380,k8s-master-03=https://192.168.2.153:2380 \
--initial-advertise-peer-urls=https://192.168.2.152:2380
# k8s-master-03 节点
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd_db_bak/etcdbackupfile.db \
--data-dir=/var/lib/etcd \
--name=k8s-master-03 \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=k8s-master-01=https://192.168.2.151:2380,k8s-master-02=https://192.168.2.152:2380,k8s-master-03=https://192.168.2.153:2380 \
--initial-advertise-peer-urls=https://192.168.2.153:2380
# 重启
mv /etc/kubernetes/manifests-backup/ /etc/kubernetes/manifests/
Bashetcdctl 常见命令:
# etcd 集群节点状态查看主从节点
ETCDCTL_API=3 etcdctl endpoint status --endpoints=https://192.168.2.151:2379 --endpoints=https://192.168.2.152:2379 --endpoints=https://192.168.2.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --write-out table
# etcd 集群成员列表
ETCDCTL_API=3 etcdctl member list --endpoints=https://192.168.2.151:2379 --endpoints=https://192.168.2.152:2379 --endpoints=https://192.168.2.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
# etcd 集群节点健康信息筛选出不健康的节点
ETCDCTL_API=3 etcdctl endpoint health --endpoints=https://192.168.2.151:2379 --endpoints=https://192.168.2.152:2379 --endpoints=https://192.168.2.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key
Bash