Post

VyOS High CPU Utilization with SNMP checks via Zabbix

I was working with VyOS 1.2 and 1.3 routers in a lab test. I wanted to wanted to monitor the underlying Linux/Debian objects and various routing process statuses using Zabbix. Basic items like BGP peers, OSPF neighborship, and some counters. Nothing excessive like dumping full peer tables every 30 seconds. About 24 checks in total every 60-120 seconds.

After about a week of monitoring, I discovered the router would become unresponsive, lock up, and reboot. Digging into a router during an event revealed the SNMP process was utilizing 100% of the CPU. This utilization had starved the underlying routing daemons (zebra, bgpd, ospfd), so they didn’t have the resources to process route updates.

Luckily, this problem has an easy-to-implement workaround found in the VyOS forums. It also appears this bug was resolved in later versions of VyOS.

edit /etc/default/snmpd

and replace

1
SNMPDOPTS='-LSed -u snmp -g snmp -p /run/snmpd.pid'

with (or whatever options you prefer)

1
SNMPDOPTS='-LSed -u snmp -g snmp -I -ipCidrRouteTable,inetCidrRouteTable -p /run/snmpd.pid'

Restart snmpd

1
/etc/init.d/snmpd restart

The critical item to see here is the insertion of this text into the startup options.

1
-I -ipCidrRouteTable,inetCidrRouteTable

Caution:
If you change the SNMP settings via the cli you will need to update the /etc/default/snmpd file again. This is because VyOS regenerates the file with each edit. To make the changes permanent you will need to edit the script that generates the configuration. /usr/libexec/vyos/conf_mode/snmp.py

I don’t believe Zabbix had anything to do with this other than tickling an existing bug within VyOS. Maybe my next project should be to update these lab routers.++

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.

© Kevin Schwickrath. Some rights reserved.

Using the Chirpy theme for Jekyll.