背景:

因分布式模式的关系,当smokeping master主机触发告警调用脚本发送告警邮件时,邮件中mtr信息的源端只在master主机所在地,但真实情况为某些告警是从slave主机端产生得,因此mtr信息的源端需要在slave主机所在地.

测试环境:

System				:centos7.1
kernel				:3.10.0-229.el7.x86_64
smokeping version	 :2.006011
smokeping path		:/usr/local/smokeping/

基础安装:

https://blog.newtouch.com/setup-config-smokeping/

Solution:(仅列出需添加或修改部分)

拆分master主机上触发告警时调用的脚本,分成2个部分,一部分为发送告警邮件的脚本,放置在各个smokeping服务器上,另一部分依然为master主机触发告警时调用的脚本,通过该脚本判断/决定告警源,如果是slave主机端,就通过ssh发送命令让该台slave主机本地执行发送告警邮件的脚本.

1.修改全局配置文件

# vim /usr/local/smokeping/etc/config

*** Alerts ***
to = |/usr/local/smokeping/bin/detemine_mail.sh

2.通过ssh-copy-id命令,授权master主机无需密码登入slave主机,实现ssh远程命令发送.

# ssh-copy-id -i ~/.ssh/id_rsa.pub {SLAVE_IP} -p {SLAVE_PORT}

3.增加脚本

1)master主机触发告警时调用的脚本

此脚本会读入5或6个参数:alertname,target,loss-pattern,rtt-pattern,hostname,[raise],如果告警是从slave主机端产生,target变量中会含有[from {SLAVE_NAME}],那么脚本就可以通过该值判断/决定告警源,然后通过ssh发送命令让该slave主机本地执行发送告警邮件的脚本.

在测试中发现,最初的脚本会一直等待slave主机端脚本执行完毕后才结束,这样会影响master主机正常运行,最直观的发现就是绘图有断档,因此修改后决定把ssh命令放到后台执行,并设定一个时间值将此脚本杀掉,保证master主机正常运行,同时也不会影响ssh命令正常运行.

# vim /usr/local/smokeping/bin/detemine_mail.sh

#!/bin/bash
#########################################################
# Script to determine which smokeping will send a mail #
#########################################################

# 把所有传过来的变量输出到脚本调用日志里,方便统计和问题排查
echo "$(date +%F-%T)" >> /tmp/invoke.log
echo $@ >> /tmp/invoke.log

# 自定义变量
smokename="SMOKEPING"
 
# 网络恢复逻辑判断
if [ "$3" = "loss: 0%" ];
then
    subject="Clear-${smokename}-ALERT: $2 – $5"
else
    subject="${smokename}-ALERT: $2 – $5"
fi

kill_script() {
	pid=`ps -elf | grep detemine_mail | grep -v grep | awk '{print $4}' | head -n 1`
	sleep 3 && kill -9 $pid
}

judgement=$2
case ${judgement#*\ } in 
"[from {SLAVE_NAME}]")
		command="ssh -f -n root@{SLAVE_IP} -p {SLAVE_PORT} 'export alertname=\"$1\" target=\"$2\" loss=\"$3\" rtt=\"$4\" hostname=\"$5\" SUBJECT=\"$subject\" MAILFROM="slave_name@your_domain.com" ; sh /usr/local/smokeping/bin/send_mail.sh &'"
		eval $command
		kill_script
		;;
*)
		export alertname="$1" target="$2" loss="$3" rtt="$4" hostname="$5" SUBJECT="$subject" MAILFROM="master_name@your_domain.com" ; sh /usr/local/smokeping/bin/send_mail.sh &
		kill_script
		;;
esac

2)发送告警邮件的脚本

# vim /usr/local/smokeping/bin/send_mail.sh

#!/bin/bash
###################################################
# Script to sned a mail with ping and mtr report #
###################################################

MAILTO="{EMAIL_ADDRESS};{EMAIL_ADDRESS}"
smokeping_mail_content=/tmp/smokeping_mail_content.$$

# generate mail content
echo "Alert name: " $alertname > ${smokeping_mail_content}
echo "Target: " $target >> ${smokeping_mail_content}
echo "loss Pattern: " $loss >> ${smokeping_mail_content}
echo "rtt Pattern: " $rtt >> ${smokeping_mail_content}
echo "Host name: " $hostname >> ${smokeping_mail_content}
echo "" >> ${smokeping_mail_content}
echo "PING Report: " >> ${smokeping_mail_content}
ping ${hostname} -c 4 -i 0.5 >> ${smokeping_mail_content}
echo "" >> ${smokeping_mail_content}
echo "MTR Report: " >> ${smokeping_mail_content}
mtr -n -c 60 -r ${hostname} >> ${smokeping_mail_content}

# send mail
if [ -s ${smokeping_mail_content} ];then
sendmail -F $MAILFROM -f $MAILFROM -t <<EOF
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
To: $MAILTO
From: $MAILFROM
Subject: ${SUBJECT}

`sed -e 's/$/\r/' ${smokeping_mail_content}`

.
EOF
fi

# delete mail
rm -f ${smokeping_mail_content}

4.邮件内容样式

From	:xn1@newtouch.cn
Subject:SMOKEPING-ALERT: Internet-Gateway.IDC-gyInternetGateway.GateWay-ltxn1-out - x.x.x.x

Alert name:  someloss
Target:  Internet-Gateway.IDC-gyInternetGateway.GateWay-ltxn1-out
loss Pattern:  loss: 0%, 0%, 0%, 35%, 30%, 40%
rtt Pattern:  rtt: 50ms, 50ms, 50ms, 68ms, 67ms, 67ms
Host name:  x.x.x.x

PING Report: 
PING x.x.x.x (x.x.x.x) 56(84) bytes of data.
64 bytes from x.x.x.x: icmp_seq=1 ttl=251 time=1.11 ms
64 bytes from x.x.x.x: icmp_seq=2 ttl=251 time=0.982 ms
64 bytes from x.x.x.x: icmp_seq=3 ttl=251 time=1.10 ms
64 bytes from x.x.x.x: icmp_seq=4 ttl=251 time=1.06 ms

--- x.x.x.x ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 1502ms
rtt min/avg/max/mdev = 0.982/1.066/1.113/0.052 ms

MTR Report: 
Start: Fri Jun  9 10:59:57 2017
HOST: xxxx                       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- x.x.x.x                  0.0%    60    2.5   4.6   2.5  22.2   2.6
  2.|-- x.x.x.x                  0.0%    60    3.1   3.5   2.4  14.2   1.4
  3.|-- ???                      100.0    60    0.0   0.0   0.0   0.0   0.0
  4.|-- x.x.x.x                  0.0%    60    1.1   1.1   1.0   1.3   0.0

5.建议

1)后续可以在send_mail.sh中增加case判断,根据监测目标将邮件发送到对应的负责人

2)如要使用第3方mail server,需要在/etc/mail.rc中增加相关变量

set from=yourname@your_domain.com

set smtp=mail.your_domain.com

set smtp-auth-user=yourname

set smtp-auth-password=yourpasswd

set smtp-auth=login