RabbitMQ安装rabbitmq_prometheus插件后,可支持Prometheus监控。
可参考官方文档。
推荐直接使用官方方案,以下方案为三方方案。
背景
部署RabbitMQ Exporter
实现对RabbitMQ
的监控,需要准备RabbitMQ
的登录信息。点击跳转到项目地址。
rabbitmq_exporter部署
在合适的节点部署rabbitmq_exporter
,此节点需要与RabbitMQ
和Prometheus
均可通信。
cd /usr/local/src/
wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC19/rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz
mkdir /opt/rabbitmq_exporter
tar xf rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz -C /opt/rabbitmq_exporter
编辑配置文件,可参考官方链接。
cat /opt/rabbitmq_exporter/config.josn
{
"rabbit_url": "http://10.66.3.247:15672",
"rabbit_user": "admin",
"rabbit_pass": "H23@20X4y8JD",
"publish_port": "9419"
}
通过 systemd
对服务进行管理
cat > /etc/systemd/system/rabbitmq_exporter.service << EOF
[Unit]
Description=RabbitMQ Exporter
Requires=network-online.target
After=network.target
[Service]
User=ops
Group=ops
Restart=on-failure
ExecStart=/opt/rabbitmq_exporter/rabbitmq_exporter -config-file=/opt/rabbitmq_exporter/config.josn
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
TimeoutStopSec=5
[Install]
WantedBy=multi-user.target
EOF
创建ops
用户,用于运行exporter
,并配置目录权限。
|
|
启动服务并加入开机自启。
systemctl daemon-reload
systemctl enable rabbitmq_exporter.service
systemctl start rabbitmq_exporter.service
验证服务
netstat -lnpt | grep rabbitmq
tcp6 0 0 :::9419 :::* LISTEN 697874/rabbitmq_exp
Prometheus配置
Prometehus配置
在 Prometheus
的配置文件中添加如下内容。
|
|
重新加载配置文件。
curl -XPOST http://localhost:9090/-/reload
报警规则配置
添加告警规则
- alert: RabbitmqDown
expr: rabbitmq_up{service="{{ template "rabbitmq.fullname" . }}"} == 0
for: 5m
labels:
severity: error
annotations:
summary: Rabbitmq down (instance {{ "{{ $labels.instance }}" }})
description: RabbitMQ node down
- alert: ClusterDown
expr: |
sum(rabbitmq_running{service="{{ template "rabbitmq.fullname" . }}"})
< {{ .Values.replicaCount }}
for: 5m
labels:
severity: error
annotations:
summary: Cluster down (instance {{ "{{ $labels.instance }}" }})
description: |
Less than {{ .Values.replicaCount }} nodes running in RabbitMQ cluster
VALUE = {{ "{{ $value }}" }}
- alert: ClusterPartition
expr: rabbitmq_partitions{service="{{ template "rabbitmq.fullname" . }}"} > 0
for: 5m
labels:
severity: error
annotations:
summary: Cluster partition (instance {{ "{{ $labels.instance }}" }})
description: |
Cluster partition
VALUE = {{ "{{ $value }}" }}
- alert: OutOfMemory
expr: |
rabbitmq_node_mem_used{service="{{ template "rabbitmq.fullname" . }}"}
/ rabbitmq_node_mem_limit{service="{{ template "rabbitmq.fullname" . }}"}
* 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: Out of memory (instance {{ "{{ $labels.instance }}" }})
description: |
Memory available for RabbmitMQ is low (< 10%)\n VALUE = {{ "{{ $value }}" }}
LABELS: {{ "{{ $labels }}" }}
- alert: TooManyConnections
expr: rabbitmq_connectionsTotal{service="{{ template "rabbitmq.fullname" . }}"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: Too many connections (instance {{ "{{ $labels.instance }}" }})
description: |
RabbitMQ instance has too many connections (> 1000)
VALUE = {{ "{{ $value }}" }}\n LABELS: {{ "{{ $labels }}" }}