Forum Discussion

daboochmeister's avatar
Jul 05, 2015

Downstream LTM error processing POST proxied from upstream LTM

Strange error encountered. We have a traffic flow that goes:

 

Browser -> LTM1 -> LTM2 -> pool of WebLogic servers

 

We sporadically encounter timeouts on POST requests - tcpdump shows that the POST request makes it to LTM2, and LTM2 initiates a connection to a selected real server, but the POST operation does not complete, and eventually the WebLogic server times out the connection (with an error saying it can't parse the POST content; and the timeout occurs per a "POST read timeout" setting in WebLogic).

 

Detailed iRule logging shows that when this occurs, LTM2 is unable to read the POSTed content ... when I do an HTTP::collect in an HTTP_REQUEST event, it fails to trigger an HTTP_REQUEST_DATA event. Everything appears correct - the Content-Length header is accurate, the POSTed content (per tcpdump) appears to be correct, the same as was received at LTM1, etc. But LTM2 simply doesn't read the content (apparently). There are no logged errors that I can find in the LTM log or anywhere else.

 

Through sheer luck, I stumbled across a workaround - if I do an HTTP::collect in HTTP_REQUEST on LTM1, followed by an HTTP::release in HTTP_REQUEST_DATA ... it magically fixes LTM2's problem. Completely repeatable, take out the iRule on LTM1, the problem begins occurring again; put it in, and the problem goes away, and LTM2 is able to do a successful HTTP::collect/HTTP_REQUEST_DATA sequence.

 

I have a case open with support, but they didn't have any feedback on it to this point.

 

Has anyone encountered a similar situation? We're ok with leaving in this iRule-based fix, but would prefer not to have such a workaround in use.

 

Details on the environment:

 

  • LTM2 and LTM2 are both at 11.5.2, no hotfixes
  • Both VIPs are SSL ones (though I converted LTM2's VIP to non-SSL, and it didn't change anything)
  • LTM1 is using an Oracle OAM authentication integration via APM (though the OAM processing all occurs cleanly without error, per all logs on LTM1 and the OAM servers); LTM2 doesn't have APM
  • LTM2 does a straight HTTP, non-SSL, connection to the WebLogic servers
  • SNAT pools are in use on both LTM1 and LTM2
  • OneConnect is used throughout (though turning it off on either LTM1, LTM2 or both had no effect)
  • Caching is disabled on both LTM1 and LTM2
  • Compression is enabled on both LTM1 and LTM2 (though turning it off on either LTM1 or LTM2 had no effect)
  • Anecdotally, the problem may have gotten worse after we put a firewall between LTM2 and the WebLogic servers; but the firewall processing all looks completely clean, and it's not doing any HTTP inspection, just doing simple IP-based ACLs. Cisco ASA 5585 fw
  • When the F5s are removed from the dataflow, and the browser goes directly to the WebLogic servers, the error does not occur

Any thoughts?

 

No RepliesBe the first to reply