
提示本文原创作品良心制作干货为主简洁清晰一看就会告警推送一、钉钉告警1.1 添加机器人1.2 安装webhook1.3 配置alertmanager-alertmanager.yaml1.4 创建alertmanagerConfig1.5 测试告警二、企业微信告警2.1 添加机器人2.2 告警格式转换2.3 配置alertmanager-alertmanager.yaml2.4 创建alertmanagerConfig2.5 测试告警一、钉钉告警1.1 添加机器人在钉钉群设置中新增自定义机器人填写机器人名称安全校验优先选择加签模式规避恶意调用导致的消息刷屏风险。创建完成后保存专属 Webhook 地址与加密密钥这两组参数是后续告警配置的核心凭证去群聊中添加机器人1.2 安装webhook原生 Alertmanager 无法直接对接钉钉消息格式需要部署钉钉 Webhook 转发插件实现报文格式转换插件会接收 Alertmanager 推送的原生告警 JSON 数据自动完成钉钉签名加密、报文格式封装转换为钉钉机器人可识别的消息结构rootk8s-master1:~# git clone https://github.com/timonwong/prometheus-webhook-dingtalk.gitrootk8s-master1:~# cd prometheus-webhook-dingtalk/contrib/k8s/rootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# vim config/config.yamlrootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# vim deployment.yamlrootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# kubectl kustomize | kubectl apply -f - -n monitoringrootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# kubectl get pod -n monitoring | grep dingalertmanager-webhook-dingtalk-cb7f6c584-92sqj 1/1 Running 0 13s1.3 配置alertmanager-alertmanager.yamlrootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# cd /root/kube-prometheus/manifests/rootk8s-master1:~/kube-prometheus/manifests# vim alertmanager-alertmanager.yaml1.4 创建alertmanagerConfigrootk8s-master1:~/kube-prometheus/manifests# vim dingding-alertmanagerconfig.yamlapiVersion:monitoring.coreos.com/v1alpha1kind:AlertmanagerConfigmetadata:name:dingdinglabels:# 需要和alertmanager-alertmanager.yaml中的告警配置标签保持一致alertmanagerConfig:emailnamespace:monitoringspec:route:groupBy:[severity]groupWait:1mgroupInterval:1mrepeatInterval:1mreceiver:dingding-webhookreceivers:-name:dingding-webhookwebhookConfigs:# 告警恢复时发送恢复通知-sendResolved:true# 钉钉告警webhook服务的访问地址url:http://alertmanager-webhook-dingtalk.monitoring/dingtalk/webhook1/send1.5 测试告警我目前有一套mysql高可用集群接下来将以mysql集群为例演示如何配置对应的 Prometheus 告警触发规则rootk8s-master1:~/kube-prometheus/manifests# kubectl get podNAME READY STATUS RESTARTS AGE mysql-rep-master-0 2/2 Running 8 (165m ago) 5d21h mysql-rep-slave-0 2/2 Running 6 (165m ago) 5d21h mysql-rep-slave-1 2/2 Running 6 (165m ago) 5d21h## mysql告警规则rootk8s-master1:~/kube-prometheus/manifests# vim mysql-rule.yamlapiVersion:monitoring.coreos.com/v1kind:PrometheusRulemetadata:labels:app:kube-prometheus-stackrole:alerting-rulesprometheus:kube-prometheus-stack-prometheusname:prometheus-mysql-alertsnamespace:monitoring# 请替换为你的Prometheus所在namespacespec:groups:-name:mysqlrules:# 1. 集群可用性告警 -alert:MySQLDownexpr:mysql_up 0for:1mlabels:severity:criticalnamespace:monitoringannotations:summary:MySQL实例 {{ $labels.instance }} 已宕机description:Prometheus 无法连接到 {{ $labels.pod }} 上的 MySQL 实例。这通常意味着 mysqld 进程已停止或 exporter 无法连接。# 2. 主从复制告警 # 2.1 复制延迟过高-alert:MySQLReplicationLagHighexpr:mysql_slave_status_seconds_behind_source30for:2mlabels:severity:warningnamespace:monitoringannotations:summary:MySQL 复制延迟较高description:从库 {{ $labels.pod }} (实例: {{ $labels.instance }}) 复制落后主库 {{ $value }} 秒。请检查网络延迟或主库写入负载。# 2.2 复制线程停止-alert:MySQLReplicationSQLThreadDownexpr:mysql_slave_status_replica_sql_running 0for:1mlabels:severity:criticalnamespace:monitoringannotations:summary:MySQL 复制 SQL 线程停止description:从库 {{ $labels.pod }} 的 SQL 线程未运行数据同步已中断。请检查 relay log 是否有损坏或错误。-alert:MySQLReplicationIOThreadDownexpr:mysql_slave_status_replica_io_running 0for:1mlabels:severity:criticalnamespace:monitoringannotations:summary:MySQL 复制 IO 线程停止description:从库 {{ $labels.pod }} 的 IO 线程未运行无法从主库获取二进制日志网络连接可能已断开。进入mysql slave pod内部关闭主从同步测试一下能不能收到告警rootk8s-master1:~/kube-prometheus/manifests# kubectl exec -it mysql-rep-slave-0 /bin/bashI have no name!mysql-rep-slave-0:/$mysql-uroot-pRoot12345 mysqlstop replica;收到告警消息进入mysql slave pod内部恢复主从同步测试一下能不能收到恢复消息rootk8s-master1:~/kube-prometheus/manifests# kubectl exec -it mysql-rep-slave-0 /bin/bashI have no name!mysql-rep-slave-0:/$mysql-uroot-pRoot12345 mysqlstart replica;收到恢复消息到此Prometheus钉钉告警就到此结束了二、企业微信告警2.1 添加机器人去群聊中添加机器人2.2 告警格式转换Alertmanager 用的是监控行业标准告警协议报文企业微信群机器人用的是IM 聊天工具自定义消息协议报文两套协议互不认识必须通过中间件做「报文解析→内容提取→格式重组」才能正常把监控告警发到微信群我准备了一个 wechat.yaml 一共创建两类 K8s 资源Deployment部署运行 Python Flask 写的告警转发容器启动一个监听 5000 端口的 Web 服务接收 Alertmanager 推送的标准告警 JSON自动格式化转换成企业微信可识别的消息格式再调用企微接口发送告警、告警恢复通知Service给部署的 Pod 创建集群内固定访问入口通过服务名 prometheus-webhook-wechat.monitoring:5000 让 Alertmanager 可以稳定调用这个转发服务不需要依赖 Pod 动态变化的 IP 地址rootk8s-master1:~# vim wechat.yaml---apiVersion:apps/v1kind:Deploymentmetadata:labels:app:prometheus-webhook-wechatname:prometheus-webhook-wechatnamespace:monitoringspec:replicas:1selector:matchLabels:app:prometheus-webhook-wechattemplate:metadata:labels:app:prometheus-webhook-wechatspec:dnsConfig:options:-name:ndotsvalue:2containers:-name:prometheus-webhook-wechatimage:linge365/webhook-wechat:latestimagePullPolicy:IfNotPresentenv:-name:ROBOT_TOKEN# 粘贴刚才企业微信上复制的tokenvalue:6a1b465b-8e27-42c5-acc1-29c09084fa18ports:-containerPort:5000protocol:TCPresources:requests:cpu:100mmemory:100Milimits:cpu:200mmemory:500Mi---apiVersion:v1kind:Servicemetadata:labels:app:prometheus-webhook-wechatname:prometheus-webhook-wechatnamespace:monitoringspec:ports:-port:5000protocol:TCPtargetPort:5000selector:app:prometheus-webhook-wechatrootk8s-master1:~# kubectl apply -f wechat.yaml2.3 配置alertmanager-alertmanager.yamlrootk8s-master1:~/prometheus-webhook-dingtalk/contrib/k8s# cd /root/kube-prometheus/manifests/rootk8s-master1:~/kube-prometheus/manifests# vim alertmanager-alertmanager.yaml2.4 创建alertmanagerConfigrootk8s-master1:~/kube-prometheus/manifests# vim wechat-alertmanagerconfig.yamlapiVersion:monitoring.coreos.com/v1alpha1kind:AlertmanagerConfigmetadata:name:wechatlabels:alertmanagerConfig:emailnamespace:monitoringspec:route:groupBy:[severity]groupWait:1mgroupInterval:1mrepeatInterval:5mreceiver:wechat-webhookreceivers:-name:wechat-webhookwebhookConfigs:-sendResolved:trueurl:http://prometheus-webhook-wechat:5000rootk8s-master1:~/kube-prometheus/manifests# kubectl apply -f wechat-alertmanagerconfig.yaml2.5 测试告警与钉钉告警测试类似我同样用我已有的一套mysql高可用来测试进入mysql slave pod内部关闭sql线程测试一下能不能收到告警rootk8s-master1:~/kube-prometheus/manifests# kubectl exec -it mysql-rep-slave-0 /bin/bashI have no name!mysql-rep-slave-0:/$mysql-uroot-pRoot12345 mysqlSTOP REPLICA IO_THREAD;收到告警消息进入mysql slave pod内部恢复主从同步测试一下能不能收到恢复消息rootk8s-master1:~/kube-prometheus/manifests# kubectl exec -it mysql-rep-slave-0 /bin/bashI have no name!mysql-rep-slave-0:/$mysql-uroot-pRoot12345 mysqlstart replica;收到恢复消息到此企业微信告警配置结束了注文中若有疏漏欢迎大家指正赐教。本文为100%原创转载请务必标注原创作者尊重劳动成果。求赞、求关注、求评论你的支持是我更新的最大动力评论区等你