2960 Memory Leaks

Cisco 2960’s and their memory leaks have on more than one occasional left me without SSH/Telnet access, but still responding to SNMP requests. No matter which code base we have tried, the same pattern of memory usage over time can be seen. To be fair we ask a lot of these switches – we’re heavy uses of 802.1x, insist on using SSH rather than telnet, and have loads of smaller services such as DHCP snooping, RSTP, aaa accounting, ntp etc  – each one uses a little memory.

Cisco 2960 memory usage over time - leaking slowly we lose SSH access at around about 90%

Cisco 2960 memory usage over time – leaking slowly we lose SSH access at around about 90%

Time to Reload

A handy feature is a SNMP oid which, if you have the read/write community, allows you to reboot a switch remotely if it is still responding to SNMP (which they usually do – even when telnet/ssh/console access isn’t working).

Most Linux distributions will have snmpset installed, but there are Windows equivalents of course.

snmpset -v1 -c your-rw-community-her 1.1.1.1 .1.3.6.1.4.1.9.2.9.9.0 i 2

Change your community string, and the IP address and run this sucker. I suggest keeping a ping running to the device so you can see if this has had the desired result, and so you can see when it’s back up and running again.

Long term Solution

We are in the process of looking to replace these switches which are fully EOL July 2017. I’m tempted to script up something to reload these out of hours when the memory usage is approaching that lose-ssh-access threshold.

Microsoft have been adding networking appliances to their marketplace recently, I see firewall offerings from Checkpoint, Barracuda,  Fortinet, and Cisco to name a few. Given the laborious situation I’m in where all of our NSG’s need to be updated manually by CSV files, I thought I would take a closer look at the Cisco ASAv.

The ASAv is only supported on the ARM / Azure v2 deployment model and requires a D3 as a bare minimum. This will provide 4 interfaces, one for management and 3 for joining to your inside network or DMZ’s. The basic license (effectively perpetual, you will pay for just the compute time) provides 100 connections and throughput of just 100Kbps, this will be fine for the sake of just testing it. I won’t cover the setup as this is covered in the Cisco quick start document here.

Having played around a bit with this I feel it’s not really ready for enterprise use.

  • You are limited to one public address on the ASA, and even this is natted automatically to a private address before it hits the ASA. Although you can add several VIP addresses to the cloud service, the firewall isn’t aware of these. Natting between those other VIP’s and the private address on the ASA means by the time the traffic reaches the ASA you cannot distinguish what has been sent to say 200.200.200.200 and 200.200.200.201. This means that if you wanted to run two web servers behind the ASA, one would need to be on port 80, and one would need to be on port 81. Awful. Microsoft should allow these public addresses to allocated directly to the firewall.
  • Clustering is not supported. In order to get Azure SLA’s you need to have two devices in an availability set. if you can’t cluster the two devices you would need to make configuration changes on each device – it’s not sensible to do this and you are likely to end up with differing configs if you’re not careful. You could use a firewall manager like CSM but this would require another machine in Azure.
  • No console access. If you fat finger a config update and lock yourself out, how are you supposed to recover?
  • Traffic is routed from each subnet via a user defined route table. There is nothing stopping another admin simply changing the routing table on a machine to circumnavigate your firewall! This may be an old way of thinking as separation of duty has truly become rather blurred in ‘the cloud’. In the old world a VLAN assigned to a machine would mean this could never happen.

I look forward to seeing how these networking appliances evolve, but I won’t be suggesting we change from using the native NSG’s just yet.