Forum Discussion

EricD_186141's avatar
EricD_186141
Icon for Altocumulus rankAltocumulus
Apr 06, 2015

LTM iControlREST API request fails with HTTP 502 Bad Gateway

Hi,

We are using LTM v11.5.2 (on a 5250v's) and successfully using the REST API for all of our F5 Automation needs.

We have been using this REST call to gather info about every LTM Pool Member. GET https://{{F5-BIGIP-LTM}}/mgmt/tm/ltm/pool?expandSubcollections=true

This has been working fine until recently when the of pools configured on one LTM went from 2200 to 3335 (as we migrated one of Cisco ACE environments over to F5). After that, the above REST call fails with this HTTP response:

    Bad Gateway! 

    The proxy server received an invalid response from an upstream server. 

    The proxy server could not handle the request GET /mgmt/tm/ltm/pool. 

    Reason: Error reading from remote server 


    Error 502 

    myltm.example.com 
    Thu Apr 2 09:26:15 2015 

After some experimentation, we were able to workaround the error by breaking the request down a smaller query size using this query param: $top=. We found that setting that to 3300 was successful. Larger than that produced the error above. So it appears that we have hit some REST capacity boundary. We think it is particular to REST because when we do the equivalent tmsh command (tmsh show ltm recursive pool members), we have no problem.

We have a few questions:

  • Is the "Bad Gateway" error, in fact, indicative of a REST API capacity limit being hit?
  • Is there any way for us to pre-determine where these REST API capacity boundaries are, short of hard experience like this?
  • The "Bad Gateway" error that the LTM turns back is less than helpful. Should we expect the LTM REST server to recognize its limitations and give back a more reasonable error if a response capacity limit is reached?

FYI...A couple of other pieces of info about the problem.

When I do this curl command:

curl -kX GET -H "Content-Type: application/json" -H "Authorization: Basic –omitted--" -H "Cache-Control: no-cache" https://myltm.example.com/mgmt/tm/ltm/pool?expandSubcollections=true

I see that this message gets put to the restjavad-audit.0.log log within about 2-3 seconds:

[I][9479][03 Apr 2015 12:46:12 UTC][ForwarderPassThroughWorker][run] {"user":"automation","method":"GET","uri":"","status":200,"from":"x.y.z.a"}

The curl command continues to wait for results for about 65-70 seconds then ends with the Bad Gateway result.

Looking a bit more I found errors being logged here: /var/log/httpd/httpd_errors (EST times)

Apr 3 08:47:14 myltm-p err httpd[16202]: [error] [client 169.184.47.60] (20014)Internal error: proxy: error reading status line from remote server localhost Apr 3 08:47:14 myltm-p err httpd[16202]: [error] [client 169.184.47.60] proxy: Error reading from remote server returned by /mgmt/tm/ltm/pool

Here are some log messages from the same time (UTC times!) in /var/log/icrd

Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: Begin start process. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: Begin listening on socket. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: Begin scheduled task thread. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: Begin socket thread. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: Begin process servicing threads. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestQueue, INFO,Start: Begin process servicing threads. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: start workers. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestServer, INFO,Start: start process done. Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestRequestSender, INFO,RestRequestSender starting Apr 3 12:46:08 myltm-p notice icrd_child: 16837,16837, RestQueue, INFO,Start: Begin process servicing threads. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestRequestSender, INFO,RestRequestSender stopping Apr 3 12:46:46 myltm-p err icrd: 5557,13989, RestServer, SEVERE,Unknown socket for write event fd:10 Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: Begin shutdown process. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: stop listening socket. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: stop scheduled task thread. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: stop process servicing threads. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: shutdown workers. Apr 3 12:46:46 myltm-p notice icrd_child[16837]: 16837,16837, RestServer, INFO,Shutdown: Starting shutdown process done.

Thanks, Eric

10 Replies

  • Fred_Wittenberg's avatar
    Fred_Wittenberg
    Historic F5 Account

    Just curious are both of you running 11.5.2? I just tested a query to /tm/ltm/pool in 11.6.0 and it works fine. I'll try this against 11.5.2 as well if this appears to be a version specific problem.

     

    • Fred_Wittenberg's avatar
      Fred_Wittenberg
      Historic F5 Account
      Also, if you are sending each REST call as an individual call there is a default limit of 10 HTTP connections to the platform. To adjust this you will need to navigate to system-->Preferences-->Advanced-->Maximum HTTP Connections To Configuration Utility. Not knowing your platform its hard to recommend a number to use, but this is potentially the cause of the 502 error.
    • EricD_186141's avatar
      EricD_186141
      Icon for Altocumulus rankAltocumulus
      UPDATE: we added 1055 iApps to our LTM (bringing the total to 3960) and consequently our script that uses REST to get the list of iApps is failing with the same Bad Gateway error. ONE time when trying the query with Postman this morning the actual Java timeout error made it back to me. It was this: https://{{F5-BIGIP-LTM}}/mgmt/tm/sys/application/service { "code": 400, "message": "Expiration was Wed Apr 15 09:06:18 EDT 2015", "restOperationId": 2760519, "errorStack": [ "java.util.concurrent.TimeoutException: Expiration was Wed Apr 15 09:06:18 EDT 2015", "at com.f5.rest.common.RestHelper$1.run(RestHelper.java:1825)", "at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)", "at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)", "at java.util.concurrent.FutureTask.run(FutureTask.java:166)", "at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)", "at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)", "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)", "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)", "at java.lang.Thread.run(Thread.java:722)\n" ] }
  • Fred, we are at 11.5.2. I am sure we are not hitting any HTTP connection limit. The REST calls that I am doing are individual calls. Tried this from Python, curl and Postman all with the same result.

     

    Thanks, Eric

     

  • While my BigIP is running on 11.6.0 , and we also not hitting HTTP connection limits.

     

    • EricD_186141's avatar
      EricD_186141
      Icon for Altocumulus rankAltocumulus
      Kevin, approximately how many pools do you have configured on your LTM?
    • Kevin_Manavadar's avatar
      Kevin_Manavadar
      Icon for Nimbostratus rankNimbostratus
      It's roughly 800 pools. and Maximum HTTP Connections To Configuration Utility is 60.
  • I'm running a viprion with multiple guests, all on 11.6.0 HF4. We see the same problem, where REST starts returning the 502 error OR "No JSON object could be decoded".

     

    In our case, it seems the restjavad* logs will have a lck but no update for a period of time. Doing a grep on restjava, there will be a process for /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64/jre/bin/java (with args), with a process start time equal to or near the log last entries. I've found that HUP'ping that process allows REST API to again be successful.

     

    I've only done this on the standby unit - not the active. I do not know if this would negatively impact any operations, as I don't know what other TMM processes or OS services have a dependency. BUT.. maybe this will trigger a thought in some really smart people..

     

    If I have time I'll scrape the logs for the same period as the HUP to see if anything reports a failure. My fear is that the customer/client connections would suffer silently.

     

  • Thanks to Chad for the lead. I started experiencing the same issue, first on one LTM VE, then on another, and then on a physical 5050s device, all running 11.6 (1 are HF4, one is HF5).

     

    I dug through the restjavad logs and found that all threw 'java.lang.OutOfMemoryError: Java heap space' and didn't do much after that. So if the restjavad process had stopped working, it stands to reason that my iControlREST API wouldn't work.

     

    Per this article, this is a known issue, resolved in 11.6 HF6 and 12.0 / 12.1. The recommended fix is to restart the restjavad process, which I did, fixing the issue. The article makes is sound like a restart is necessary after 10k authentication tokens have been requested.