aoishiの備忘録

備忘録

Zabbix Senderを自前実装する場合にはZabbix Protocolのヘッダーが必要だった

はじめに

以前に下記の記事でZabbix Senderを自前実装していましたが、一つのコネクションで一度に大量のデータを送ろうとするとたびたびエラーになることがわかりました。

実装に不備があることもわかったので、正しいと思われる方法についてまとめておきます。

aoishi.hateblo.jp

原因

私の誤った解釈により、Zabbix Senderは決められたJSONフォーマットのデータをTCPでZabbixサーバに送ればOKだと思っていましたが、実はJSONデータの前にZabbixプロトコルのヘッダーをつける必要がありました。

ヘッダーを含めたZabbix Senderの実装としては下記が参考になります。

AWS Lambda(Python)からZabbix Senderでメトリクス値を送るスクリプト | 外道父の匠

ヘッダーについては下記が参考になります。

検知経緯

原因を突き止めるまでに、下記のような事象を確認していました。

Zabbix Senderのレスポンス

まず、Zabbix Senderでのデータ送信に成功した場合には、下記のようなレスポンスが返されました。

{"response":"success","info":"processed: 39; failed: 0; total: 39; seconds spent: 0.093660"}                                                                   
{"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 0.916001"}                                                                   
{"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.131982"}                                                                     
{"response":"success","info":"processed: 56; failed: 0; total: 56; seconds spent: 0.690923"}

次に、失敗した場合のレスポンスが下記の通りです。

[1回目]
{"response":"failed","info":"cannot parse as a valid JSON object: invalid object name/value separator at: ''"}                                                 
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}                                                              
{"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.093064"}                                                                     
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}

[2回目]
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}                                                              
{"response":"success","info":"processed: 33; failed: 0; total: 33; seconds spent: 1.029911"}                                                                   
{"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.086790"}                                                                     
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}  


[3回目]
{"response":"success","info":"processed: 39; failed: 0; total: 39; seconds spent: 0.074564"}                                                                   
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}                                                              
{"response":"success","info":"processed: 3; failed: 0; total: 3; seconds spent: 0.085845"}                                                                     
{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"} 

実行する度に成功したり失敗したりと不安定です。

Zabbixサーバのログ

今後は、Zabbixサーバのログを確認してみました。DebugLevel=4に変更しています。

 10279:20180108:010832.540 __zbx_zbx_setproctitle() title:'trapper #2 [processing data]'                                                                                                                                                      
 10279:20180108:010832.540 trapper got '{"data": [{"host": "laptop03", "value": 6.983, "key": "zzz_float_value[dstat.total_cpu_usage.usr]", "clock": "1515341312"}, {"host": "laptop03", "value": 1.496, "key": "zzz_float_value[dstat.total_cpu_usage.sys]", "clock": "1515341312"}, {"host": "laptop03", "value": 91.521, "key": "zzz_float_value[dstat.total_cpu_usage.idl]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.total_cpu_usage.wai]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.total_cpu_usage.stl]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.dsk_nvme0n1.read]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.dsk_nvme0n1.writ]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.paging.in]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.paging.out]", "clock": "1515341312"}, {"host": "laptop03", "value": 1.09, "key": "zzz_float_value[dstat.load_avg.1m]", "clock": "1515341312"}, {"host": "laptop03", "value": 2.84, "key": "zzz_float_value[dstat.load_avg.5m]", "clock": "1515341312"}, {"host": "laptop03", "value": 2.89, "key": "zzz_float_value[dstat.load_avg.15m]", "clock": "1515341312"}, {"host": "laptop03", "value": 12326502400.0, "key": "zzz_float_value[dstat.memory_usage.used]", "clock": "1515341312"}, {"host": "laptop03", "value": 1197580288.0, "key": "zzz_float_value[dstat.memory_usage.free]", "clock": "1515341312"}, {"host": "laptop03", "value": 384909312.0, "key": "zzz_float_value[dstat.memory_usage.buff]", "clock": "1515341312"}, {"host": "laptop03", "value": 3268669440.0, "key": "zzz_float_value[dstat.memory_usage.cach]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.net_wlp4s0.recv]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.net_wlp4s0.send]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.run]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.blk]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.procs.new]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.io_nvme0n1.read]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.io_nvme0n1.writ]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.swap.used]", "clock": "1515341312"}, {"host": "laptop03", "value": 8359243776.0, "key": "zzz_float_value[dstat.swap.free]", "clock": "1515341312"}, {"host": "laptop03", "value": 963.0, "key": "zzz_float_value[dstat.system.int]", "clock": "1515341312"}, {"host": "laptop03", "value": 2470.0, "key": "zzz_float_value[dstat.system.csw]", "clock": "1515341312"}, {"host": "laptop03", "value": 21.0, "key": "zzz_float_value[dstat.tcp_sockets.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 50.0, "key": "zzz_float_value[dstat.tcp_sockets.act]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.tcp_sockets.syn]", "clock": "1515341312"}, {"host": "laptop03", "value": 4.0, "key": "zzz_float_value[dstat.tcp_sockets.tim]", "clock": "1515341312"}, {"host": "laptop03", "value": 4.0, "key": "zzz_float_value[dstat.tcp_sockets.clo]", "clock": "1515341312"}, {"host": "laptop03", "value": 12.0, "key": "zzz_float_value[dstat.udp.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 0.0, "key": "zzz_float_value[dstat.udp.act]", "clock": "1515341312"}, {"host": "laptop03", "value": 62.0, "key": "zzz_float_value[dstat.unix_sockets.dgm]", "clock": "1515341312"}, {"host": "laptop03", "value": 1724.0, "key": "zzz_float_value[dstat.unix_sockets.str]", "clock": "1515341312"}, {"host": "laptop03", "value": 638.0, "key": "zzz_float_value[dstat.unix_sockets.lis]", "clock": "1515341312"}, {"host": "laptop03", "value": 1086.0, "key": "zzz_float_value[dstat.unix_sockets.act]", "clock": "1515341312"}, {"host": "laptop03", "value": "{\"data\": [{\"{#KEY_NAME}\": \"dstat.total_cpu_usage.usr\"}, {\"{#KEY_NAME}\": \"dstat.total_cpu_usage.sys\"}, {\"{#KEY_NAME}\": \"dstat.total_cpu_usage.idl\"}, {\"{#KEY_'                  
 10279:20180108:010832.540 In zbx_send_response()          
 10279:20180108:010832.540 zbx_send_response() '{"response":"failed","info":"cannot parse as a valid JSON object: unexpected end of string data"}'                                                                                            
 10279:20180108:010832.540 End of zbx_send_response():SUCCEED                                                          
 10279:20180108:010832.540 received invalid JSON object from 10.0.0.245: cannot parse as a valid JSON object: unexpected end of string data
 10279:20180108:010832.541 __zbx_zbx_setproctitle() title:'trapper #2 [processed data in 0.000432 sec, waiting for connection]'

2行目(trapper got)に送信したJSONデータが記録されていましたが、途中で途切れていました。

後続のログでもJSONをパースできない旨のエラーメッセージが記録されていますので、Zabbixサーバが送信したデータを期待通り受け取れていないようです。

tcpdump

最後にtcpdumpで通信の様子を確認してみました。

クライアント側で下記コマンドでtcpdumpします。

$ sudo tcpdump -n -i any port 10051

下記が失敗した場合の結果です。10.0.0.1がZabbixサーバ、10.0.0.245がクライアントです。最後にZabbixサーバからクライアントにリセットパケットが送られているようです。

01:43:13.795474 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [S], seq 2393260655, win 29200, options [mss 1460,sackOK,TS val 4115495880 ecr 0,nop,wscale 7], length 0                                                                 
01:43:13.798023 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [S.], seq 1719041873, ack 2393260656, win 28960, options [mss 1460,sackOK,TS val 1378069742 ecr 4115495880,nop,wscale 7], length 0                                       
01:43:13.798092 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 0                                                                                        
01:43:13.798490 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1:1449, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448                                                                         
01:43:13.798513 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1449:2897, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448                                                                      
01:43:13.798519 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 2897:4345, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448                                                                      
01:43:13.798523 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], seq 4345:5793, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 1448                                                                      
01:43:13.798527 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [P.], seq 5793:5974, ack 1, win 229, options [nop,nop,TS val 4115495883 ecr 1378069742], length 181                                                                      
01:43:13.799338 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 1449, win 249, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0                                                                                     
01:43:13.799373 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 2897, win 272, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0                                                                                     
01:43:13.799386 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 4345, win 295, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0                                                                                     
01:43:13.799683 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [P.], seq 1:142, ack 4345, win 295, options [nop,nop,TS val 1378069745 ecr 4115495883], length 141                                                                       
01:43:13.799710 IP 10.0.0.245.47534 > 10.0.0.1.zabbix-trapper: Flags [.], ack 142, win 237, options [nop,nop,TS val 4115495885 ecr 1378069745], length 0                                                                                      
01:43:13.799732 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [.], ack 5974, win 340, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0                                                                                     
01:43:13.799747 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [F.], seq 142, ack 5974, win 340, options [nop,nop,TS val 1378069745 ecr 4115495883], length 0                                                                           
01:43:13.799760 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [R.], seq 143, ack 5974, win 340, options [nop,nop,TS val 0 ecr 4115495883], length 0                                                                                    
01:43:13.800474 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.47534: Flags [R], seq 1719042015, win 0, length 0

ちなみに、成功した場合は下記のような結果になりました。

01:47:03.611559 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [S], seq 587356983, win 29200, options [mss 1460,sackOK,TS val 4115725695 ecr 0,nop,wscale 7], length 0                                                                  
01:47:03.616337 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [S.], seq 1228978652, ack 587356984, win 28960, options [mss 1460,sackOK,TS val 1378299561 ecr 4115725695,nop,wscale 7], length 0                                        
01:47:03.616385 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 0                                                                                        
01:47:03.616660 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1:1449, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448                                                                         
01:47:03.616674 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 1449:2897, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448                                                                      
01:47:03.616676 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 2897:4345, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448                                                                      
01:47:03.616678 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], seq 4345:5793, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 1448                                                                      
01:47:03.616685 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [P.], seq 5793:5967, ack 1, win 229, options [nop,nop,TS val 4115725700 ecr 1378299561], length 174                                                                      
01:47:03.619438 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 1449, win 249, options [nop,nop,TS val 1378299565 ecr 4115725700], length 0                                                                                     
01:47:03.620498 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 2897, win 272, options [nop,nop,TS val 1378299565 ecr 4115725700], length 0                                                                                     
01:47:03.620749 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 4345, win 295, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0                                                                                     
01:47:03.620785 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5793, win 317, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0                                                                                     
01:47:03.624183 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5967, win 340, options [nop,nop,TS val 1378299566 ecr 4115725700], length 0                                                                                     
01:47:03.850201 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [P.], seq 1:106, ack 5967, win 340, options [nop,nop,TS val 1378299795 ecr 4115725700], length 105                                                                       
01:47:03.850298 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [.], ack 106, win 229, options [nop,nop,TS val 4115725934 ecr 1378299795], length 0                                                                                      
01:47:03.850357 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [F.], seq 106, ack 5967, win 340, options [nop,nop,TS val 1378299795 ecr 4115725700], length 0                                                                           
01:47:03.850643 IP 10.0.0.245.48794 > 10.0.0.1.zabbix-trapper: Flags [F.], seq 5967, ack 107, win 229, options [nop,nop,TS val 4115725934 ecr 1378299795], length 0                                                                           
01:47:03.851771 IP 10.0.0.1.zabbix-trapper > 10.0.0.245.48794: Flags [.], ack 5968, win 340, options [nop,nop,TS val 1378299797 ecr 4115725934], length 0