Forum Discussion

Dan_24306's avatar
Dan_24306
Icon for Nimbostratus rankNimbostratus
Oct 30, 2014

Question regarding iHealth

Hello All,

 

I was involved in an issue with a pair of production LTM's that essentially hit the wall, and hit the wall very hard first thing Monday morning of this week. By the time I was contacted about it, steps taken were to fail traffic to the standby node, reboot what was the active node, and then failed traffic back. That fix restored the service and all was good until first thing on Tuesday morning when the same thing happened again. Once again, applications were down and a reboot was the only thing we could do to get it back up. Long story short, we implemented 11.5.1HF5 (we were already on 11.5.1HF4 when the issue occurred) and have not had an issue since. I am trying to determine a root cause on what the issue was. I'm suspecting it was Shellshock but I am not totally sure. One thing I did manage to collect was a QKView for almost each time the state of the appliances would change. I am reviewing them in iHealth and see mass amounts of great information all contained within the moment the QKView was run. I am rather new to iHealth and absolutely loving what I am seeing in there but unfortunately time is not my best friend as we need to establish a root cause ASAP. My question is do we know a "quick hit" of things to check within iHealth that could guide us in the right direction for identifying the root cause of the outage? As stated, I am suspecting it was Shellshock but I need concrete proof it was and I need to verify if anything else could have contributed to it. If I've learned anything in nearly 2 decades of being in this field, it is generally a series of smaller events that bring the beast down. I don't have a problem if the root cause of our issue was from a single smoking gun but I just want to ensure it was only Shellshock and not anything else with it. I advised our management about Shellshock but the immediate questions were regarding why it took so long for us to be infected and if we were infected as early as of last weekend, they want to see the proof via the logs (or iHealth info). Everything is running as expected now but the only true fix was implementing software 11.5.1HF5. Please let me know if anyone has any suggestions on what to check in iHealth for this outage or if that is something I should open a case and run through Support. Any suggestions/input would be greatly appreciated. Thanks and have a great day all!!

 

-Dan

 

2 Replies

  • shaggy's avatar
    shaggy
    Icon for Nimbostratus rankNimbostratus

    Assuming you have a support account, get F5 support involved to look into the qkviews. I would start by looking at the ihealth diagnostics, and then peruse /var/log/ltm and system resource graphs. You could also look at /var/log/audit and possibly /var/log/secure to see who was logged in at the time the issues happened. Shellshock affecting the F5 management-plane could only be instigated by an already-authenticated administrator (CVE-2014-6271 Shellshocked) - verify that your F5 management interface and self-IP addreses are not accessible from the Internet.

     

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus

    Why would you think it was shellshock? Do you allow non-admins to login? Or run CGI's on publicly available web servers direct on the BigIP (i.e. The admin GUI).

     

    Shellshock isn't a virus, or trojan. It's a bug in the bash shell where it interpret various things as code (e.g. ENV vars) and evaluates them... Which can lead to unintended consequences. IF someone has access to the BigIP admin interfaces bash shell, I'm not sure if there's any CGI on the GUI that would be vulnerable. Not seem any indication. But an attacker needs access anyway. And normally you wouldn't allow anyone except trusted admins access to the admin interfaces in the first place...

     

    H