1参考文档
2kubeadm初始化集群(生产环境)
2.1Master和Work节点域名配置
编辑 /etc/hosts:
172.20.30.1 master
172.20.30.2 node1
172.20.30.3 node2
2.2禁用Swap分区
编辑/etc/fstab文件,注释/swapfile ...或/swap.img这一行,然后执行swapoff -a重启。如果是K8S in Docker模式,需禁用宿主机的Swap分区。
2.3配置cgroup driver
在/etc/docker/daemon.json文件添加"exec-opts": ["native.cgroupdriver=systemd"],然后重启Docker服务。如果是K8S in Docker模式,可以在构建镜像时提前设置。
2.4配置DNS域名解析
resolv 会自动指向一个默认生成的IP,这里可以设置为根域名的解析。
# 1. 备份原文件
cp /etc/resolv.conf /etc/resolv.conf.bak
# 2. 强制修改为阿里云 DNS 或 Google DNS
# (注意:如果是软链接,可能需要先 rm /etc/resolv.conf 再创建)
echo "nameserver 223.5.5.5" | sudo tee /etc/resolv.conf
# 或者
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
# 3. 删除 CoreDNS Pod 让它自动重启
# kubectl delete pod -n kube-system -l k8s-app=kube-dns
2.5配置Containerd
- 备份并彻底重新生成
sudo cp /etc/containerd/config.toml /etc/containerd/config.toml.bak
containerd config default > /etc/containerd/config.toml
2、查找并替换阿里云地址
sed -i -E "s|sandbox = ['\"]registry.k8s.io/pause:[^'\"]+['\"]|sandbox = 'registry.aliyuncs.com/google_containers/pause:3.10.1'|" /etc/containerd/config.toml
3、修改Cgroup驱动,开启SystemdCgroup
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
4、修正关键配置 (使用 sed 自动化修改),确保没有禁用CRI
sudo sed -i 's/disabled_plugins = \["cri"\]/disabled_plugins = []/g' /etc/containerd/config.toml
5、重启Containerd
sudo systemctl restart containerd
脚本
#!/bin/bash
cp /etc/resolv.conf /etc/resolv.conf.bak
echo "nameserver 223.5.5.5" | sudo tee /etc/resolv.conf
sudo cp /etc/containerd/config.toml /etc/containerd/config.toml.bak
containerd config default > /etc/containerd/config.toml
sed -i -E "s|sandbox = ['\"]registry.k8s.io/pause:[^'\"]+['\"]|sandbox = 'registry.aliyuncs.com/google_containers/pause:3.10.1'|" /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
sudo sed -i 's/disabled_plugins = \["cri"\]/disabled_plugins = []/g' /etc/containerd/config.toml
sudo systemctl restart containerd
2.6加载br_netfilter内核模块
1、临时加载模块(立即生效)
如果是K8S in Docker模式,先退出容器,回到宿主机(运行 docker-compose 的那台机器),执行以下命令:
sudo modprobe br_netfilter
2、验证
验证是否加载成功:
# 执行出现类似 /proc/sys/net/bridge/bridge-nf-call-iptables 则说明成功了
ls /proc/sys/net/bridge/bridge-nf-call-iptables
# 或者执行以下命令,如果看到输出(如 1),说明成功了
sysctl net.bridge.bridge-nf-call-iptables
3、(可选)设置开机自动加载: 为了防止重启机器后失效,建议执行:
echo "br_netfilter" | sudo tee /etc/modules-load.d/k8s.conf
2.7工具(kubeadm, kubelet and kubectl)安装
1、执行脚本
#!/bin/bash
# 指定文件包路径
K8S_INSTALL_PKG=~/k8s_install_pkg
# 安装 CNI 插件
CNI_PLUGINS_VERSION="v1.9.0"
ARCH="amd64"
DEST="/opt/cni/bin"
sudo mkdir -p "$DEST"
# https://github.com/containernetworking/plugins/releases/
# curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-${ARCH}-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz
tar -zxvf $K8S_INSTALL_PKG/cni-plugins-linux-${ARCH}-${CNI_PLUGINS_VERSION}.tgz -C "$DEST"
# 指定可执行目录地址
DOWNLOAD_DIR="/usr/local/bin"
sudo mkdir -p "$DOWNLOAD_DIR"
# 安装 crictl
CRICTL_VERSION="v1.35.0"
ARCH="amd64"
# https://github.com/kubernetes-sigs/cri-tools/releases
# curl -L "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-${ARCH}.tar.gz" | sudo tar -C $DOWNLOAD_DIR -xz
tar -zxvf $K8S_INSTALL_PKG/crictl-${CRICTL_VERSION}-linux-${ARCH}.tar.gz -C $DOWNLOAD_DIR
# RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
RELEASE="v1.35.0"
ARCH="amd64"
# cd $DOWNLOAD_DIR
# sudo curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
cd $K8S_INSTALL_PKG
cp kubeadm kubelet kubectl $DOWNLOAD_DIR/
cd $DOWNLOAD_DIR
sudo chmod +x {kubeadm,kubelet,kubectl}
RELEASE_VERSION="v0.16.2"
# curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/krel/templates/latest/kubelet/kubelet.service" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /usr/lib/systemd/system/kubelet.service
sed "s:/usr/bin:${DOWNLOAD_DIR}:g" $K8S_INSTALL_PKG/kubelet.service | sudo tee /usr/lib/systemd/system/kubelet.service
sudo mkdir -p /usr/lib/systemd/system/kubelet.service.d
# curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/krel/templates/latest/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${DOWNLOAD_DIR}:g" | sudo tee /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
sed "s:/usr/bin:${DOWNLOAD_DIR}:g" $K8S_INSTALL_PKG/10-kubeadm.conf | sudo tee /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
- 注意:请根据实际情况调整
K8S_INSTALL_PKG的值。
2、检查验证
- 这能同时验证文件是否存在、是否有执行权限、是否在 PATH 中
kubeadm version -o short
kubelet --version
kubectl version --client
crictl --version
2.8创建集群(初始化Master节点)
1、初始化
模板:
kubeadm init \
--kubernetes-version=v1.35.0 \
--image-repository registry.aliyuncs.com/google_containers \
--cri-socket=unix:///var/run/containerd/containerd.sock \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=你的实际IP \
--kubelet-extra-args="--cgroup-driver=systemd" \
--v=5
示例:
kubeadm init --kubernetes-version=v1.35.0 \
--image-repository registry.aliyuncs.com/google_containers \
--cri-socket=unix:///var/run/containerd/containerd.sock \
--apiserver-advertise-address=172.20.30.1 \
--pod-network-cidr=10.244.0.0/16 \
--ignore-preflight-errors=SystemVerification,Swap \
--v=5
- 注意:如果遇到错误使用下面命令清理:
# 1. 清理
kubeadm reset -f
rm -rf /var/lib/kubelet/*
2、输出结果,如果看到类似上面信息就表示集群初始化成功。
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.20.30.1:6443 --token jev8ad.2wbfbjrjwezvlzt4 \
--discovery-token-ca-cert-hash sha256:be07e5c8ff2242fea232b538041f8a0b77e46453cb163ffb98432b1dd2076ec9
3、执行输出的命令:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bashrc
source ~/.bashrc
4、获取节点
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-1 NotReady control-plane 117s v1.35.0
# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-bbdc5fdf6-k7t62 0/1 Pending 0 2m29s
kube-system coredns-bbdc5fdf6-xzbwb 0/1 Pending 0 2m29s
kube-system etcd-vm-1 1/1 Running 0 2m37s
kube-system kube-apiserver-vm-1 1/1 Running 0 2m37s
kube-system kube-controller-manager-vm-1 1/1 Running 0 2m36s
kube-system kube-proxy-tmv9c 0/1 Error 4 (110s ago) 2m30s
kube-system kube-scheduler-vm-1 1/1 Running 0 2m36s
# crictl ps
WARN[0000] Config "/etc/crictl.yaml" does not exist, trying next: "/usr/local/bin/crictl.yaml"
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
26fb3bba8867c 2c9a4b058bd7e 2 minutes ago Running kube-controller-manager 0 a060bb7731606 kube-controller-manager-vm-1 kube-system
6aa464a554992 5c6acd67e9cd1 2 minutes ago Running kube-apiserver 0 d5cac01ec7976 kube-apiserver-vm-1 kube-system
5d5ac79c489d7 550794e3b12ac 2 minutes ago Running kube-scheduler 0 53c836104d614 kube-scheduler-vm-1 kube-system
b590d591ece32 0a108f7189562 2 minutes ago Running etcd 0 435076612aa2c etcd-vm-1 kube-system
- 注意:第一次安装会看到有几个Pod状态并非是Running,没关系,下面一一解决。
2.9安装网络插件
从上面可以看到vm-1节点处于NotReady状态,要让集群成功启动,还需要安装网络插件。
写法A:最简单(官方推荐,适用于默认 10.244.0.0/16 网段)
# 2025 年后仍然有效的官方部署方式(kube-flannel.yml)
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
写法B:更清晰、可控(推荐)
# 下载官方 yaml(建议这一步)
wget https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
# 如果你改了 pod 网段(比如改成了 10.100.0.0/16),必须修改 Network 字段
# 搜索 net-conf.json 那部分,把 Network 改成你 kubeadm init 时指定的值
vi kube-flannel.yml
net-conf.json: |
{
"Network": "10.244.0.0/16", ← 改这里!!!
"Backend": {
"Type": "vxlan"
}
}
应用:
# kubectl apply -f kube-flannel.yml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
# kubectl get po -n kube-system
可以看到kube-flannel正在开始初始化,稍等几分钟就会进入Running状态,集群也会变为Ready状态:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-1 NotReady control-plane 3m45s v1.35.0
# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-l6shq 0/1 Init:1/2 0 74s
kube-system coredns-bbdc5fdf6-k7t62 0/1 Pending 0 4m33s
kube-system coredns-bbdc5fdf6-xzbwb 0/1 Pending 0 4m33s
kube-system etcd-vm-1 1/1 Running 0 4m41s
kube-system kube-apiserver-vm-1 1/1 Running 0 4m41s
kube-system kube-controller-manager-vm-1 1/1 Running 0 4m40s
kube-system kube-proxy-tmv9c 0/1 CrashLoopBackOff 5 (97s ago) 4m34s
kube-system kube-scheduler-vm-1 1/1 Running 0 4m40s
# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-7rnw9 1/1 Running 0 83s
kube-system coredns-bbdc5fdf6-2qnqb 0/1 ContainerCreating 0 26m
kube-system coredns-bbdc5fdf6-p752g 0/1 ContainerCreating 0 26m
kube-system etcd-vm-1 1/1 Running 0 26m
kube-system kube-apiserver-vm-1 1/1 Running 0 26m
kube-system kube-controller-manager-vm-1 1/1 Running 0 26m
kube-system kube-proxy-2ltpn 0/1 CrashLoopBackOff 9 (4m57s ago) 26m
kube-system kube-scheduler-vm-1 1/1 Running 0 26m
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-1 Ready control-plane 26m v1.35.0
2.10(可选)解决kube-proxy崩溃 (CrashLoopBackOff)问题
原因:在 Docker 容器(DinD)中运行 K8s 时,kube-proxy 经常会崩溃。这是因为它试图修改宿主机的内核参数(如加载内核模块、修改 sysctl),但即使是 privileged 容器,某些内核路径也是只读的或受限的。
操作 1:查看报错日志 请执行以下命令:
# kubectl logs -n kube-system kube-proxy-2ltpn
I0128 02:46:39.895463 1 server_linux.go:53] "Using iptables proxy"
I0128 02:46:39.953482 1 shared_informer.go:370] "Waiting for caches to sync"
I0128 02:46:40.054509 1 shared_informer.go:377] "Caches are synced"
I0128 02:46:40.054559 1 server.go:218] "Successfully retrieved NodeIPs" NodeIPs=["172.20.30.1"]
I0128 02:46:40.057190 1 conntrack.go:57] "Setting nf_conntrack_max" nfConntrackMax=131072
I0128 02:46:40.057249 1 conntrack.go:115] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=131072
E0128 02:46:40.057272 1 server.go:134] "Error running ProxyServer" err="open /proc/sys/net/netfilter/nf_conntrack_max: permission denied"
E0128 02:46:40.057296 1 run.go:72] "command failed" err="open /proc/sys/net/netfilter/nf_conntrack_max: permission denied"
原因分析: 日志显示 open /proc/sys/net/netfilter/nf_conntrack_max: permission denied。 即便你是特权容器(Privileged),在 Docker-in-Docker 环境中,kube-proxy 默认试图修改宿主机的内核连接跟踪参数(Conntrack)以优化性能,但因为 /proc 文件系统的隔离限制,这个操作被拒绝了。
解决方案: 我们要修改 kube-proxy 的配置,告诉它不要尝试修改内核参数(设为 0 即代表禁用修改)。
编辑 kube-proxy 的配置表: 在容器内执行:
kubectl -n kube-system edit configmap kube-proxy
修改 conntrack 字段: 在编辑器中找到 conntrack 部分(通常在文件中间位置),将 maxPerCore 和 min 都改为 0。
conntrack:
# 【重点】改为 0,表示不修改内核参数
maxPerCore: 0
min: 0
tcpCloseWaitTimeout: 1h0m0s
...
保存退出 (:wq)。
重启 kube-proxy 使配置生效:
kubectl -n kube-system delete pod -l k8s-app=kube-proxy
验证: 等待几秒后查看状态:
kubectl get pod -n kube-system | grep proxy
2.11其他节点(Work)加入集群
注意,其他的节点也要和vm-1节点一样分别修改必要的配置。
1、加入节点
kubeadm join 172.20.30.1:6443 --token cpuge5.0f5g69rq2b1zui91 \
--discovery-token-ca-cert-hash sha256:c3105bdc3264387397a8e97969ca75e3eb08deef7dc0cfc5d920a2cc7b150022 \
--ignore-preflight-errors=SystemVerification,Swap
2、输出信息
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
3、回到第一个节点master(vm-1)节点执行下面命令,会看到已经有两个节点加入到集群中了:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm-1 Ready control-plane 60m v1.35.0
vm-2 Ready <none> 2m26s v1.35.0
vm-3 Ready <none> 41s v1.35.0
# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bbdc5fdf6-gf5bx 1/1 Running 0 65m
coredns-bbdc5fdf6-jhlpq 1/1 Running 0 65m
etcd-vm-1 1/1 Running 0 65m
kube-apiserver-vm-1 1/1 Running 0 65m
kube-controller-manager-vm-1 1/1 Running 0 65m
kube-proxy-c9qbk 1/1 Running 0 26m
kube-proxy-pk66m 1/1 Running 0 23m
kube-proxy-q6tqc 1/1 Running 0 65m
kube-scheduler-vm-1 1/1 Running 0 65m
恭喜你!你已经在容器里成功部署了一个完整的 Kubernetes 集群! 🎉
3kind初始化集群(学习环境)
3.1环境准备
环境
| Resource | OS | Description |
|---|---|---|
| vm-1 | Ubuntu24.04 LTS | Master |
| vm-2 | Ubuntu24.04 LTS | Worker |
| vm-3 | Ubuntu24.04 LTS | Worker |
硬件
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 2CPU | 4CPU |
软件
| Resource | Description |
|---|---|
| kind | kind |
| Docker | kind是以容器方式运行Kubenetes容器集群的,所以需要Docker |
| kubectl | 使用kind文档中的示例 |
3.2安装kubectl
# curl -LO https://dl.k8s.io/release/v1.35.0/bin/linux/amd64/kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin
3.3安装kind
# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.31.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.31.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
3.4创建普通集群
1、创建集群
# 直接创建
kind create cluster --name my-cluster
# 或使用国内镜像
kind create cluster --name mycluster \
--image m.daocloud.io/docker.io/kindest/node:v1.35.0
# 或先使用国内镜像源拉取镜像再创建
docker pull m.daocloud.io/docker.io/kindest/node:v1.35.0
2、完整过程
Creating cluster "mycluster" ...
✓ Ensuring node image (m.daocloud.io/docker.io/kindest/node:v1.35.0) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-mycluster"
You can now use your cluster with:
kubectl cluster-info --context kind-mycluster
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
3、切换kind集群
为何要切换集群?因为kind可以初始化多个集群,使用某个集群时则需要进行切换。
kubectl cluster-info --context kind-my-cluster
4、查看镜像与集群
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eb08e6ecd0b2 m.daocloud.io/docker.io/kindest/node:v1.35.0 "/usr/local/bin/entr…" 4 minutes ago Up 4 minutes 127.0.0.1:42175->6443/tcp mycluster-control-plane
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
mycluster-control-plane Ready control-plane 4m30s v1.35.0
3.5创建一主两从集群
1、配置
cat <<-EOF > ./kind-cluster-config.yaml
# three node (two workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
EOF
2、初始化
kind create cluster --config kind-cluster-config.yaml \
--image m.daocloud.io/docker.io/kindest/node:v1.35.0
3、完整过程
# kind create cluster --config kind-cluster-config.yaml \
--image m.daocloud.io/docker.io/kindest/node:v1.35.0
Creating cluster "kind" ...
✓ Ensuring node image (m.daocloud.io/docker.io/kindest/node:v1.35.0) 🖼
✓ Preparing nodes 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! 😊
# kubectl cluster-info --context kind-kind
Kubernetes control plane is running at https://127.0.0.1:45849
CoreDNS is running at https://127.0.0.1:45849/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kind-control-plane Ready control-plane 4m28s v1.35.0
kind-worker Ready <none> 4m19s v1.35.0
kind-worker2 Ready <none> 4m19s v1.35.0