Zabbix Senderを自前実装する場合にはZabbix Protocolのヘッダーが必要だった
はじめに
以前に下記の記事でZabbix Senderを自前実装していましたが、一つのコネクションで一度に大量のデータを送ろうとするとたびたびエラーになることがわかりました。
実装に不備があることもわかったので、正しいと思われる方法についてまとめておきます。
原因
私の誤った解釈により、Zabbix Senderは決められたJSONフォーマットのデータをTCPでZabbixサーバに送ればOKだと思っていましたが、実はJSONデータの前にZabbixプロトコルのヘッダーをつける必要がありました。
ヘッダーを含めたZabbix Senderの実装としては下記が参考になります。
AWS Lambda(Python)からZabbix Senderでメトリクス値を送るスクリプト | 外道父の匠
- Zabbix Sender Python version - ZABBIX Forums
- Docs/protocols/zabbix sender/3.4 - Zabbix.org
- python-zabbix/senderprotocol.py at master · jbfavre/python-zabbix · GitHub
ヘッダーについては下記が参考になります。
検知経緯
原因を突き止めるまでに、下記のような事象を確認していました。
Zabbix Senderのレスポンス
まず、Zabbix Senderでのデータ送信に成功した場合には、下記のようなレスポンスが返されました。
{"response":"success","info":"processed: 39; failed: 0; total: 39; seconds spent: 0.093660"} {"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 0.916001"} {"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.131982"} {"response":"success","info":"processed: 56; failed: 0; total: 56; seconds spent: 0.690923"}
次に、失敗した場合のレスポンスが下記の通りです。
[1回目] {"response":"failed","info":"cannot parse as a valid JSON object: invalid object name/value separator at: ''"} {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} {"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.093064"} {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} [2回目] {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} {"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 1.029911"} {"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.086790"} {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} [3回目] {"response":"success","info":"processed: 39; failed: 0; total: 39; seconds spent: 0.074564"} {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} {"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.085845"} {"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}
実行する度に成功したり失敗したりと不安定です。
Zabbixサーバのログ
今後は、Zabbixサーバのログを確認してみました。DebugLevel=4に変更しています。
10279:20180108:010832.540 __zbx_zbx_setproctitle() title:'trapper #2 [processing data]' 10279:20180108:010832.540 trapper got '{"data": [{"host": "laptop03", "value": 6.983, "key": "zzz_float_value[dstat.total_cpu_usage.usr]", "clock": "1515341312"}, {"host": "laptop03", "value": 1.496, "key": "zzz_float_value[dstat.total_cpu_usage.sys]", "clock": "1515341312"}, {"host": "laptop03", "value": 91.521, "key": "zzz_float_value[dstat.total_cpu_usage.idl]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.total_cpu_usage.wai]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.total_cpu_usage.stl]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.dsk_nvme0n1.read]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.dsk_nvme0n1.writ]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.paging.in]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.paging.out]", "clock": "1515341312"}, {"host": "laptop03", "value": 1.09, "key": "zzz_float_value[dstat.load_avg.1m]", "clock": "1515341312"}, {"host": "laptop03", "value": 2.84, "key": "zzz_float_value[dstat.load_avg.5m]", "clock": "1515341312"}, {"host": "laptop03", "value": 2.89, "key": "zzz_float_value[dstat.load_avg.15m]", "clock": "1515341312"}, {"host": "laptop03", "value": 12326502400.0, "key": "zzz_float_value[dstat.memory_usage.used]", "clock": "1515341312"}, {"host": "laptop03", "value": 1197580288.0, "key": "zzz_float_value[dstat.memory_usage.free]", "clock": "1515341312"}, {"host": "laptop03", "value": 384909312.0, "key": "zzz_float_value[dstat.memory_usage.buff]", "clock": "1515341312"}, {"host": "laptop03", "value": 3268669440.0, "key": "zzz_float_value[dstat.memory_usage.cach]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.net_wlp4s0.recv]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.net_wlp4s0.send]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.run]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.blk]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.new]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.io_nvme0n1.read]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.io_nvme0n1.writ]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.swap.used]", "clock": "1515341312"}, {"host": "laptop03", "value": 8359243776.0, "key": "zzz_float_value[dstat.swap.free]", "clock": "1515341312"}, {"host": "laptop03", "value": 963.0, "key": "zzz_float_value[dstat.system.int]", "clock": "1515341312"}, {"host": "laptop03", "value": 2470.0, "key": "zzz_float_value[dstat.system.csw]", "clock": "1515341312"}, {"host": "laptop03", "value": 21.0, "key": "zzz_float_value[dstat.tcp_sockets.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 50.0, "key": "zzz_float_value[dstat.tcp_sockets.act]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.tcp_sockets.syn]", "clock": "1515341312"}, {"host": "laptop03", "value": 4.0, "key": "zzz_float_value[dstat.tcp_sockets.tim]", "clock": "1515341312"}, {"host": "laptop03", "value": 4.0, "key": "zzz_float_value[dstat.tcp_sockets.clo]", "clock": "1515341312"}, {"host": "laptop03", "value": 12.0, "key": "zzz_float_value[dstat.udp.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.udp.act]", "clock": "1515341312"}, {"host": "laptop03", "value": 62.0, "key": "zzz_float_value[dstat.unix_sockets.dgm]", "clock": "1515341312"}, {"host": "laptop03", "value": 1724.0, "key": "zzz_float_value[dstat.unix_sockets.str]", "clock": "1515341312"}, {"host": "laptop03", "value": 638.0, "key": "zzz_float_value[dstat.unix_sockets.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 1086.0, "key": "zzz_float_value[dstat.unix_sockets.act]", "clock": "1515341312"}, {"host": "laptop03", "value": "{\"data\": [{\"{#KEY_NAME}\": \"dstat.total_cpu_usage.usr\"}, {\"{#KEY_NAME}\": \"dstat.total_cpu_usage.sys\"}, {\"{#KEY_NAME}\": \"dstat.total_cpu_usage.idl\"}, {\"{#KEY_' 10279:20180108:010832.540 In zbx_send_response() 10279:20180108:010832.540 zbx_send_response() '{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}' 10279:20180108:010832.540 End of zbx_send_response():SUCCEED 10279:20180108:010832.540 received invalid JSON object from 10.0.0.245: cannot parse as a valid JSON object: unexpected end of string data 10279:20180108:010832.541 __zbx_zbx_setproctitle() title:'trapper #2 [processed data in 0.000432 sec, waiting for connection]'
2行目(trapper got)に送信したJSONデータが記録されていましたが、途中で途切れていました。
後続のログでもJSONをパースできない旨のエラーメッセージが記録されていますので、Zabbixサーバが送信したデータを期待通り受け取れていないようです。
tcpdump
最後にtcpdumpで通信の様子を確認してみました。
クライアント側で下記コマンドでtcpdumpします。
$ sudo tcpdump -n -i any port 10051
下記が失敗した場合の結果です。10.0.0.1がZabbixサーバ、10.0.0.245がクライアントです。最後にZabbixサーバからクライアントにリセットパケットが送られているようです。
01:43:13.795474 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [S], seq 2393260655, win 29200, options [mss 1460,sackOK,TS val 4115495880 ecr 0,nop,wscale 7], length 0 01:43:13.798023 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [S.], seq 1719041873, ack 2393260656, win 28960, options [mss 1460,sackOK,TS val 1378069742 ecr 4115495880,nop,wscale 7], length 0 01:43:13.798092 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 0 01:43:13.798490 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1:1449, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448 01:43:13.798513 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1449:2897, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448 01:43:13.798519 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 2897:4345, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448 01:43:13.798523 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 4345:5793, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448 01:43:13.798527 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [P.], seq 5793:5974, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 181 01:43:13.799338 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 1449, win 249, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0 01:43:13.799373 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 2897, win 272, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0 01:43:13.799386 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 4345, win 295, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0 01:43:13.799683 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [P.], seq 1:142, ack 4345, win 295, options [nop,nop,TS val 1378069745 ecr 4115495883], length 141 01:43:13.799710 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], ack 142, win 237, options [nop,nop,TS val 4115495885 ecr 1378069745], length 0 01:43:13.799732 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 5974, win 340, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0 01:43:13.799747 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [F.], seq 142, ack 5974, win 340, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0 01:43:13.799760 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [R.], seq 143, ack 5974, win 340, options [nop,nop,TS val 0 ecr 4115495883], length 0 01:43:13.800474 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [R], seq 1719042015, win 0, length 0
ちなみに、成功した場合は下記のような結果になりました。
01:47:03.611559 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [S], seq 587356983, win 29200, options [mss 1460,sackOK,TS val 4115725695 ecr 0,nop,wscale 7], length 0 01:47:03.616337 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [S.], seq 1228978652, ack 587356984, win 28960, options [mss 1460,sackOK,TS val 1378299561 ecr 4115725695,nop,wscale 7], length 0 01:47:03.616385 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 0 01:47:03.616660 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1:1449, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448 01:47:03.616674 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1449:2897, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448 01:47:03.616676 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 2897:4345, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448 01:47:03.616678 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 4345:5793, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448 01:47:03.616685 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [P.], seq 5793:5967, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 174 01:47:03.619438 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 1449, win 249, options [nop,nop,TS val 1378299565 ecr 4115725700], length 0 01:47:03.620498 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 2897, win 272, options [nop,nop,TS val 1378299565 ecr 4115725700], length 0 01:47:03.620749 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 4345, win 295, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0 01:47:03.620785 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5793, win 317, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0 01:47:03.624183 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5967, win 340, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0 01:47:03.850201 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [P.], seq 1:106, ack 5967, win 340, options [nop,nop,TS val 1378299795 ecr 4115725700], length 105 01:47:03.850298 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], ack 106, win 229, options [nop,nop,TS val 4115725934 ecr 1378299795], length 0 01:47:03.850357 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [F.], seq 106, ack 5967, win 340, options [nop,nop,TS val 1378299795 ecr 4115725700], length 0 01:47:03.850643 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [F.], seq 5967, ack 107, win 229, options [nop,nop,TS val 4115725934 ecr 1378299795], length 0 01:47:03.851771 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5968, win 340, options [nop,nop,TS val 1378299797 ecr 4115725934], length 0