5.prometheus告警插件-alertmanager、自定义webhook案例编写
5.prometheus告警插件-alertmanager
參考文章:
https://www.bookstack.cn/read/prometheus-book/alert-install-alert-manager.md
https://blog.csdn.net/aixiaoyang168/article/details/98474494
https://www.cnblogs.com/xiaobaozi-95/p/10740511.html (主要)
prometheus本身不支持告警功能,主要通過插件alertmanage來實現告警。AlertManager用于接收Prometheus發送的告警并對于告警進行一系列的處理后發送給指定的用戶。
prometheus觸發一條告警的過程:
prometheus—>觸發閾值—>超出持續時間—>alertmanager—>分組|抑制|靜默—>媒體類型—>郵件|釘釘|微信等。
5.1.prometheus+alertmanager+webhook實現自定義監控報警系統
以下主要參考:
https://www.cnblogs.com/leoyang63/articles/13973749.html
https://www.cnblogs.com/caizhenghui/p/9144805.html
prometheus+grafana+mtail+node_exporter實現機器負載及業務監控(https://blog.csdn.net/bluuusea/article/details/104341054)介紹了使用mtail和node_exporter實現的prometheus無埋點監控機器負載和業務的監控系統,本文是在其基礎上實現自定義報警功能。
Prometheus + Alertmanager的警報分為兩個部分:
?Prometheus負責中配置警報規則,將警報發送到Alertmanager。
?Alertmanager負責管理這些警報,包括沉默,抑制,合并和發送通知。
Alertmanager 發送通知有多種方式,其內部集成了郵箱、Slack、企業微信等三種方式,也提供了webhook的方式來擴展報警通知方式,網上也有大量例子實現對第三方軟件的集成,如釘釘等。本文介紹郵件報警方式和通過使用java來搭建webhook自定義通知報警的方式。
本文內容主要分為四塊:
?prometheus報警規則配置
?alertmanager配置及部署
?關聯prometheus和alertmanager
?配置報警通知方式
5.1.1.Prometheus配置報警規則
Prometheus.yml屬性配置
| scrpe_interval | 樣本采集周期,默認為1分鐘采集一次。 |
| evaluation_interval | 告警規則計算周期,默認為1分鐘計算一次。 |
| rule_files | 指定告警規則的文件 |
| scrape_configs | job的配置項,里面可配多組job任務。 |
| job_name | 任務名稱,需要唯一性 |
| static_configs | job_name的配置選項,一般使用file_sd_configs 熱加載配置。 |
| file_sd_configs | job_name的動態配置選項,使用此配置可以實現配置文件的熱加載。 |
| files | file_sd_configs配置的服務發現的文件路徑列表,支持.json,.yml或.yaml,路徑最后一層支持通配符* |
| refresh_interval | file_sd_configs中的files重新加載的周期,默認5分鐘 |
此處我們使用rule_files屬性來設置告警文件(在prometheus.yml中配置如下)
# my global config global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration alerting:alertmanagers:- static_configs:- targets: ["172.17.0.2:9093"]# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. # 告警規則中可以指定多個,并且可以使用通配符* rule_files:- "rules/host_rules.yml"# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["172.17.0.2:9090"]- job_name: 'node_exporter'static_configs:- targets: ['172.17.0.2:8080']- job_name: 'push-metrics'static_configs:- targets: ['172.17.0.2:9091']labels:instance: pushgateway在prometheus中設置告警規則,rules/host_rules.yml
groups: # 報警組組名稱 - name: hostStatsAlert#報警組規則rules:#告警名稱,需唯一- alert: hostCpuUsageAlert#promQL表達式expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85#滿足此表達式持續時間超過for規定的時間才會觸發此報警for: 1mlabels:#嚴重級別severity: pageannotations:#發出的告警標題summary: "實例 {{ $labels.instance }} CPU 使用率過高"#發出的告警內容description: "實例{{ $labels.instance }} CPU 使用率超過 85% (當前值為: {{ $value }})"- alert: hostMemUsageAlertexpr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85for: 1mlabels:severity: pageannotations:summary: "實例 {{ $labels.instance }} 內存使用率過高"description: "實例 {{ $labels.instance }} 內存使用率 85% (當前值為: {{ $value }})"配置完規則之后,訪問:http://localhost:19090/alerts,可以看到:
5.1.2.alertmanager下載、安裝、啟動
tar -zxvf alertmanager-0.22.2.linux-amd64.tar.gz -C /root/installed/cd /root/installed/alertmanager nohup ./alertmanager --config.file=alertmanager.yml > alertmanager.file 2>&1 &服務器上訪問路徑:
http://localhost:9093/本機上的訪問路徑:
http://localhost:19093/#/alerts5.1.3.創建alertmanager配置文件
Alertmanager解壓后會包含一個默認的alertmanager.yml配置文件,內容如下所示:
route:group_by: ['alertname']group_wait: 30sgroup_interval: 5mrepeat_interval: 1hreceiver: 'web.hook' receivers: - name: 'web.hook'webhook_configs:- url: 'http://127.0.0.1:5001/' inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']Alertmanager的配置主要包含兩個部分:路由(route)以及接收器(receivers)。所有的告警信息都會從配置中的頂級路由(route)進入路由樹,根據路由規則將告警信息發送給相應的接收器。
5.1.4.關聯Prometheus與Alertmanager
prometheus.yml中的alerting標簽下配置上alertmanager的地址即可,配置如下(此步上面已經配置了,下面只是作為部署時的參考):
alerting:alertmanagers: #配置alertmanager- static_configs:- targets:- 172.17.0.2:9093 # alertmanager服務器ip端口rule_files:- "rules/*.yml"5.1.5.配置報警通知方式
5.1.5.1.alertmanager郵箱報警demo
以下是alertmanager.yml中的配置:
global:#超時時間resolve_timeout: 5m#smtp地址需要加端口smtp_smarthost: 'smtp.126.com:25'smtp_from: 'xxx@126.com'#發件人郵箱賬號smtp_auth_username: 'xxx@126.com'#賬號對應的授權碼(不是密碼),阿里云個人版郵箱目前好像沒有授權碼,126郵箱授權碼可以在“設置”里面找到smtp_auth_password: '1qaz2wsx'smtp_require_tls: falseroute:group_by: ['alertname']group_wait: 10sgroup_interval: 1mrepeat_interval: 4hreceiver: 'mail' receivers: - name: 'mail'email_configs:- to: 'xxx@aliyun.com'設置后如果有通知,即可收到郵件如下:
5.1.5.2.alertmanager使用webhook(java)報警demo
此時要將alertmanager.yml修改成:
global:resolve_timeout: 5mroute:group_by: ['alertname']group_wait: 10sgroup_interval: 10srepeat_interval: 1mreceiver: 'webhook'routes:- receiver: webhookgroup_wait: 10sreceivers: - name: 'webhook'webhook_configs:# 下面的url是自定義springboot項目中接口的訪問url地址- url: 'http://172.17.0.2:8060/demo'send_resolved: true inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']使用webhook方式,alertmanager會給配置的webhook地址發送一個http類型的post請求,參數為json字符串(字符串類型),如下(此處格式化為json了):
{"receiver":"webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"hostCpuUsageAlert","instance":"192.168.199.24:9100","severity":"page"},"annotations":{"description":"192.168.199.24:9100 CPU 使用率超過 85% (當前值為: 0.9973333333333395)","summary":"機器 192.168.199.24:9100 CPU 使用率過高"},"startsAt":"2020-02-29T19:45:21.799548092+08:00","endsAt":"2020-02-29T19:49:21.799548092+08:00","generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=sum+by%28instance%29+%28avg+without%28cpu%29+%28irate%28node_cpu_seconds_total%7Bmode%21%3D%22idle%22%7D%5B5m%5D%29%29%29+%3E+0.85&g0.tab=1","fingerprint":"368e9616d542ab48"}],"groupLabels":{"alertname":"hostCpuUsageAlert"},"commonLabels":{"alertname":"hostCpuUsageAlert","instance":"192.168.199.24:9100","severity":"page"},"commonAnnotations":{"description":"192.168.199.24:9100 CPU 使用率超過 85% (當前值為: 0.9973333333333395)","summary":"機器 192.168.199.24:9100 CPU 使用率過高"},"externalURL":"http://localhost.localdomain:9093","version":"4","groupKey":"{}:{alertname="hostCpuUsageAlert"}" }此時需要使用java(其他任何語言都可以,反正只要能處理http的請求就行)搭建個http的請求處理器來處理報警通知,如下(以下代碼示例展示了接收host_rules.yml規則告警得到的數據的方式):
package com.demo.demo1.controller;import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONObject; import lombok.extern.slf4j.Slf4j; import org.apache.commons.lang3.StringUtils; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.ResponseBody;import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map;@Slf4j @Controller @RequestMapping("/") public class AlertController {@RequestMapping(value = "/demo", produces = "application/json;charset=UTF-8")@ResponseBodypublic String pstn(@RequestBody String json) {log.debug("alert notify params: {}", json);Map<String, Object> result = new HashMap<>();result.put("msg", "報警失敗");result.put("code", 0);if(StringUtils.isBlank(json)){return JSON.toJSONString(result);}JSONObject jo = JSON.parseObject(json);JSONObject commonAnnotations = jo.getJSONObject("commonAnnotations");String status = jo.getString("status");if (commonAnnotations == null) {return JSON.toJSONString(result);}String subject = commonAnnotations.getString("summary");String content = commonAnnotations.getString("description");List<String> emailusers = new ArrayList<>();emailusers.add("xxx@aliyun.com");List<String> users = new ArrayList<>();users.add("158*****5043");try {boolean success = Util.email(subject, content, emailusers);if (success) {result.put("msg", "報警成功");result.put("code", 1);}} catch (Exception e) {log.error("=alert email notify error. json={}", json, e);}try {boolean success = Util.sms(subject, content, users);if (success) {result.put("msg", "報警成功");result.put("code", 1);}} catch (Exception e) {log.error("=alert sms notify error. json={}", json, e);}return JSON.toJSONString(result);}}5.1.5.3.完整簡單的SpringBoot工程案例
5.1.5.3.1.工程結構
5.1.5.3.2.pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.3.5.RELEASE</version><relativePath/> <!-- lookup parent from repository --></parent><groupId>com.example</groupId><artifactId>demo</artifactId><version>0.0.1-SNAPSHOT</version><name>demo</name><description>Demo project for Spring Boot</description><properties><java.version>1.8</java.version></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>com.h2database</groupId><artifactId>h2</artifactId><scope>runtime</scope></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><!-- JSON Configuration --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.6</version></dependency><!--<dependency><groupId>org.apache.commons</groupId><artifactId>commons-lang3</artifactId><version>3.11</version></dependency>--></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><configuration><excludes><exclude><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></exclude></excludes></configuration></plugin></plugins></build><repositories><repository><id>spring-milestones</id><name>Spring Milestones</name><url>https://repo.spring.io/milestone</url><snapshots><enabled>false</enabled></snapshots></repository><repository><id>spring-snapshots</id><name>Spring Snapshots</name><url>https://repo.spring.io/snapshot</url><releases><enabled>false</enabled></releases></repository></repositories><pluginRepositories><pluginRepository><id>spring-milestones</id><name>Spring Milestones</name><url>https://repo.spring.io/milestone</url><snapshots><enabled>false</enabled></snapshots></pluginRepository><pluginRepository><id>spring-snapshots</id><name>Spring Snapshots</name><url>https://repo.spring.io/snapshot</url><releases><enabled>false</enabled></releases></pluginRepository></pluginRepositories></project>5.1.5.3.3.AlertController
package com.example.demo;import com.alibaba.fastjson.JSON; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.ResponseBody;import java.util.HashMap; import java.util.Map;@Controller @RequestMapping("/") public class AlertController {private final static Logger logger = LoggerFactory.getLogger(BlogAction.class);@RequestMapping(value = "/demo", produces = "application/json;charset=UTF-8")@ResponseBodypublic String pstn(@RequestBody String json) {//直接將結果存入到log文件中logger.error(json);Map<String, Object> result = new HashMap<>();result.put("msg", "報警失敗");result.put("code", 0);// if(StringUtils.isBlank(json)){ // return JSON.toJSONString(result); // } // JSONObject jo = JSON.parseObject(json); // // JSONObject commonAnnotations = jo.getJSONObject("commonAnnotations"); // String status = jo.getString("status"); // if (commonAnnotations == null) { // return JSON.toJSONString(result); // } // // String subject = commonAnnotations.getString("summary"); // String content = commonAnnotations.getString("description"); // // result.put("subject",subject); // result.put("content",content);return JSON.toJSONString(result);}}5.1.5.3.4.DemoApplication
package com.example.demo;import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication;@SpringBootApplication public class DemoApplication {public static void main(String[] args) {SpringApplication.run(DemoApplication.class, args);}}5.1.5.3.5.application.properties
server.port=8060 server.tomcat.uri-encoding=utf-85.1.5.3.6.logback.xml
<?xml version="1.0" encoding="UTF-8"?><configuration><appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"><encoder><pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern></encoder></appender><!-- 日志記錄器,日期滾動記錄 --><appender name="fileInfoApp" class="ch.qos.logback.core.rolling.RollingFileAppender"><!-- 正在記錄的日志文件的路徑及文件名 --><!-- <file>${LOG_PATH}/warn/log_warn.log</file> --><!-- 日志記錄器的滾動策略,按日期,按大小記錄 --><rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy"><!-- 歸檔的日志文件的路徑,例如今天是2013-12-21日志,當前寫的日志文件路徑為file節點指定,可以將此文件與file指定文件路徑設置為不同路徑,從而將當前日志文件或歸檔日志文件置不同的目錄。而2013-12-21的日志文件在由fileNamePattern指定。%d{yyyy-MM-dd}指定日期格式,%i指定索引 --><fileNamePattern>log/log-error-%d{yyyy-MM-dd}.%i.log</fileNamePattern><!-- 表示只保留最近30天的日志,以防止日志填滿整個磁盤空間。--><maxHistory>30</maxHistory><!--用來指定日志文件的上限大小,例如設置為1GB的話,那么到了這個值,就會刪除舊的日志。--><totalSizeCap>1GB</totalSizeCap><!-- 除按日志記錄之外,還配置了日志文件不能超過2M,若超過2M,日志文件會以索引0開始,命名日志文件,例如log-error-2013-12-21.0.log --><timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP"><maxFileSize>2MB</maxFileSize></timeBasedFileNamingAndTriggeringPolicy></rollingPolicy><!-- 追加方式記錄日志 --><append>true</append><!-- 日志文件的格式 --><encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"><pattern>===%d{yyyy-MM-dd HH:mm:ss.SSS} %-5level %logger Line:%-3L - %msg%n</pattern><charset>utf-8</charset></encoder><!-- 此日志文件只記錄war級別的 --><filter class="ch.qos.logback.classic.filter.LevelFilter"><!-- 只保留error日志 --><!-- level:debug,info,warn,error --><level>ERROR</level><onMatch>ACCEPT</onMatch><onMismatch>DENY</onMismatch></filter></appender><!-- root節點要放到appender之后 --><root level="INFO"><appender-ref ref="STDOUT" /><appender-ref ref="fileInfoApp" /></root> </configuration>5.1.5.3.7.打包、運行、查看日志
在IDEA的terminal中打包:
mvn clean install將打的包demo-0.0.1-SNAPSHOT.jar放到/root/workspace,如下:
其中start.sh中的內容如下:
[root@node1 workspace]# cat start.sh cd /root/workspacenohup java -jar demo-0.0.1-SNAPSHOT.jar > demo.log 2>&1 &查看log,可以看到具體的內容(此處略),具體的報警json格式如下:
使用webhook方式,alertmanager會給配置的webhook地址發送一個http類型的post請求,參數為json字符串(字符串類型),如下(此處格式化為json了):
5.1.6.prometheus中的其他報警規則配置案例
以下取自:https://www.cnblogs.com/caizhenghui/p/9144805.html
節點掛掉了的監控:node_down.yml
groups: - name: examplerules: - alert: InstanceDownexpr: up == 0for: 1mlabels:user: caizhannotations:summary: "Instance {{ $labels.instance }} down"description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."節點內存使用率監控報警參考配置(memory_over.yml)
groups: - name: examplerules:- alert: NodeMemoryUsageexpr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 80for: 1mlabels:user: caizhannotations:summary: "{{$labels.instance}}: High Memory usage detected"description: "{{$labels.instance}}: Memory usage is above 80% (current value is:{{ $value }})"當然,想要監控節點內存需要提前配置好node_exporter
修改prometheus配置文件prometheus.yml,開啟報警功能,添加報警規則配置文件
# Alertmanager configuration alerting:alertmanagers:- static_configs:- targets: ["localhost:9093"]# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files:- "node_down.yml"- "memory_over.yml"總結
以上是生活随笔為你收集整理的5.prometheus告警插件-alertmanager、自定义webhook案例编写的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 农行信用卡提现利息怎么算
- 下一篇: Influxdb安装、启动influxd