共计 2239 个字符,预计需要花费 6 分钟才能阅读完成。
controller-manager:负责管理集群各种资源,保证资源处于预期的状态。
kube-scheduler:资源调度,负责决定将 Pod 放到哪个 Node 上运行。
环境(使用 kubeadm 安装的 k8s 集群)
Kubernetes v1.23.8
prometheus operator 0.11.0
报警监控如图
原因
ServiceMonitor 资源对象的声明方式,kube-system 这个命名空间下需要匹配具有 k8s-app=kube-scheduler 这样的 Service,但是在 kubeadm 安装的中 k8s 集群没有对应的 Service:
kubectl get svc -n kube-system
处理办法
1.controller-manager 和 kube-scheduler 监听地址(位置一般在 /etc/kubernetes/manifests 下)
kube-controller-manager.yaml kube-scheduler.yaml
更改 –bind-address=127.0.0.1 为 –bind-address=0.0.0.0
2. 创建一个服务并确保它有一个与 ServiceMonitor k8s-app: kube-scheduler 匹配的标签
3. 确保服务与正确的标签正确匹配的 pod(component: kube-scheduler)
可以通过下面的命令查看,具体的 pod 名称以你安装的为准
kubectl get pods kube-controller-manager-xx-k8s-master-001 -n kube-system -o yaml
4.Service 的端口名称必须与 ServiceMonitor https-metrics 匹配具体也可以通过上面的命令查看健康检查端口
创建对应 Service
[root@az-k8s-nginx-001]$cat kube-controller-manager-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-controller-manager
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
ports:
- name: https-metrics
port: 10257
selector:
component: kube-controller-manager
[root@az-k8s-nginx-001]$cat kube-scheduler-svc.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: k8s-app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
k8s-app: kube-scheduler
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
ports:
- name: https-metrics
port: 10259
selector:
component: kube-scheduler
kubectl apply -f kube-controller-manager-svc.yaml
kubectl apply -f kube-scheduler-svc.yaml
执行成功后报警消失
微信机器人也报警恢复
Grafana 也已经出图
具体参数官网相关用户有相关解决方案
https://github.com/prometheus-operator/kube-prometheus/issues/718