TACACS+, packet loss and 'Authorization failed' errors

I've been hunting a TACACS+ issue that a customer reported whereby they would be able to log into a network device but somewhere in the duration of the session they would try and run a command and get the dreaded '% Authorization failed' error.

After a couple of seconds they would be able to continue as if nothing was amiss.

The tac_plus (I am using the Event-Driven version) logs were not showing anything out of the ordinary so this was a rather perplexing issue. The error seemed to be popping up rather randomly (and always when I did not have a tcpdump running).

While observing several devices over a larger time period I noticed that they were experiencing intermittent packet loss and this got me to wonder of the packet loss events I was noticing were not coinciding with the authorization failures.

Lucky for me tac_plus is running on a Linux host and so I thought I'd simulate packet loss between one of my own devices and the tac_plus service to see what that would result in. To implement the simulated packet loss I simply put the following rule in place:

iptables -A INPUT -m static -mode random -probability 0.5 -s /32 -j DROP

With a probability of 0.5 (this can be between 0 and 1) I was dropping around 40-50% of the packets.

Lo and behold, '% Authorization failure', scientifically reproducible.

