2023-11-14    2023-12-23    682 字  2 分钟

RabbitMQ安装rabbitmq_prometheus插件后,可支持Prometheus监控。

可参考官方文档。

推荐直接使用官方方案,以下方案为三方方案。

背景

部署RabbitMQ Exporter实现对RabbitMQ的监控,需要准备RabbitMQ的登录信息。点击跳转到项目地址。

rabbitmq_exporter部署

在合适的节点部署rabbitmq_exporter,此节点需要与RabbitMQPrometheus均可通信。

cd /usr/local/src/
wget https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC19/rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz
mkdir /opt/rabbitmq_exporter
tar xf rabbitmq_exporter_1.0.0-RC19_linux_amd64.tar.gz -C /opt/rabbitmq_exporter

编辑配置文件,可参考官方链接

cat /opt/rabbitmq_exporter/config.josn
{
    "rabbit_url": "http://10.66.3.247:15672",
    "rabbit_user": "admin",
    "rabbit_pass": "H23@20X4y8JD",
    "publish_port": "9419"
}

通过 systemd 对服务进行管理

cat > /etc/systemd/system/rabbitmq_exporter.service << EOF
[Unit]
Description=RabbitMQ Exporter
Requires=network-online.target
After=network.target

[Service]
User=ops
Group=ops
Restart=on-failure
ExecStart=/opt/rabbitmq_exporter/rabbitmq_exporter -config-file=/opt/rabbitmq_exporter/config.josn
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target
EOF

创建ops用户,用于运行exporter,并配置目录权限。

1
2
useradd -M ops
chmod -R ops. rabbitmq_exporter/

启动服务并加入开机自启。

systemctl daemon-reload
systemctl enable rabbitmq_exporter.service
systemctl start rabbitmq_exporter.service

验证服务

netstat -lnpt | grep rabbitmq
tcp6       0      0 :::9419                 :::*                    LISTEN      697874/rabbitmq_exp

Prometheus配置

Prometehus配置

Prometheus 的配置文件中添加如下内容。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
- job_name: rabbitmq_exporter
  honor_timestamps: true
  scrape_interval: 10s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: true
  enable_http2: true
  static_configs:
  - targets:
    - 10.66.3.128:9419

重新加载配置文件。

curl -XPOST http://localhost:9090/-/reload

报警规则配置

添加告警规则

  - alert: RabbitmqDown
    expr: rabbitmq_up{service="{{ template "rabbitmq.fullname" . }}"} == 0
    for: 5m
    labels:
      severity: error
    annotations:
      summary: Rabbitmq down (instance {{ "{{ $labels.instance }}" }})
      description: RabbitMQ node down
  - alert: ClusterDown
    expr: |
      sum(rabbitmq_running{service="{{ template "rabbitmq.fullname" . }}"})
      < {{ .Values.replicaCount }}
    for: 5m
    labels:
      severity: error
    annotations:
      summary: Cluster down (instance {{ "{{ $labels.instance }}" }})
      description: |
          Less than {{ .Values.replicaCount }} nodes running in RabbitMQ cluster
          VALUE = {{ "{{ $value }}" }}
  - alert: ClusterPartition
    expr: rabbitmq_partitions{service="{{ template "rabbitmq.fullname" . }}"} > 0
    for: 5m
    labels:
      severity: error
    annotations:
      summary: Cluster partition (instance {{ "{{ $labels.instance }}" }})
      description: |
          Cluster partition
          VALUE = {{ "{{ $value }}" }}
  - alert: OutOfMemory
    expr: |
      rabbitmq_node_mem_used{service="{{ template "rabbitmq.fullname" . }}"}
      / rabbitmq_node_mem_limit{service="{{ template "rabbitmq.fullname" . }}"}
      * 100 > 90
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: Out of memory (instance {{ "{{ $labels.instance }}" }})
      description: |
          Memory available for RabbmitMQ is low (< 10%)\n  VALUE = {{ "{{ $value }}" }}
          LABELS: {{ "{{ $labels }}" }}
  - alert: TooManyConnections
    expr: rabbitmq_connectionsTotal{service="{{ template "rabbitmq.fullname" . }}"} > 1000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: Too many connections (instance {{ "{{ $labels.instance }}" }})
      description: |
          RabbitMQ instance has too many connections (> 1000)
          VALUE = {{ "{{ $value }}" }}\n  LABELS: {{ "{{ $labels }}" }}

image-20231028232834657