Forum Discussion

Gary_T_31565's avatar
Gary_T_31565
Icon for Nimbostratus rankNimbostratus
Sep 23, 2007

Inefficient MSMQ load balancing

 

New to bigip and msmq.

 

 

How can I get 1:1 messages LB across queues.

 

 

We have a webservice (RR LB on n x IIS svrs) which suppliers post to. This works well. The web service then places a message via tcp connection in a queue. The 3.0 queue is replicated on n x w3k svrs and we LB it. The profile is tcp, pool is round robin and we only montior port 1801 is up.

 

 

Problem: While slow traffic we see messages distribute well. But SupplierA is posting all business at 4pm. The bigip LB posts to web services equally. But all messages go to one queue svr!

 

 

Thoughts: According to MS the MSMQ service will keep open a "session" (I guess they mean tcp session) for 5 minutes with "client". Is the BIGIP (the client?)letting this session stay open and using it for any WS client connection.

 

 

How can I force the queue virtual server to pass each message and/or WS client connection to the next member in queue pool?

 

 

 

 

 

 

 

 

 

9 Replies

  •  

    Also I also want to sense check what my developers are telling me. That post comes to web service and that service creates a message that it sends to virtual ip for queue pool. This would prove one web service is not sending all messages and persistence is not happening.

     

     

    I have used plenty of irules for http traffic but what about tcp?

     

     

    I could log client and server ips on tcp connection?

     

     

    if so how? SERVER_CONNECTED ? client_connected?
  • If the client opens up 1 tcp connection and sends 100 "messages" down this one connection, the BIG-IP LTM only knows about the 1 connection, not the messages inside the connection because the BIG-IP LTM is not an MQ proxy. While load balancing MQ, it can only know about the tcp side of the house. You can use iRules to inspect this traffic, but as for load balancing seperate messages inside one tcp connections, I'm not sure how that could be done.

     

    For iRules, I would probably use When server_connected like you mentioned. http://devcentral.f5.com/wiki/default.aspx/iRules/SERVER_CONNECTED.html
  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    Here's an iRule to log connection details for both client & server:
    
    when CLIENT_ACCEPTED {
      set client [IP::client_addr]:[TCP::client_port]
      log local0.info "Client connection accepted for client $client"
    }
    when SERVER_CONNECTED {
      set server [IP::server_addr]:[TCP::server_port]
      log local0.info "Server connection to server $server established for client $client"
    }

    But I think the easiest way to see whether it's one connection or several would actually be to examine a packet trace and look for multiple transactions on the same client IP:port combo.

    HTH

    /deb

  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    Meant also to say that LB'ing multiple requests in a TCP stream to multiple servers is possible, but it works well in only a very few cases, since LTM can't pipeline requests within such an iRule -- LTM must wait for the response to request 1 before sending request 2 to a different server, or response 1 will be lost.

     

     

    /deb

     

     

  • Thanks

     

     

    Spent some time monitoring traffic. LTM does a pretty good job to be fair. This is an extract from my response to the devs today complaining about spikes. Funnily enough while they were composing a response the business at end of the message chain went down for a couple of hours and the LTM balanced the stuck messages perfectly across four servers.

     

     

    For future readers the part about MSMQ will be useful as its not well documented.

     

     

    "Note: Our load balancing is by connection (not message, request or size).

     

     

    Incoming connections to Web services layer (b2b.xxxx.com) are balanced efficiently which results in equal load on Web Services layer queues.

     

     

    Why we are seeing spikes on queue layer queues?

     

     

    Firstly, load (connection and message count) over time is reasonably balanced and small 50 message spikes should not be of concern. Secondly, MSMQ by design opens a client/server tcp session and holds it open for 5 minutes by default (CleanupInterval registry key). The load balancer also by default holds open a tcp session for 5 minutes unless told otherwise via protocol. Reason for this is sensible, it is inefficient to start a tcp connection for each message. Thirdly, with all load balancing while traffic is low it will be uneven. E.g. 1st connection = 10 message in 5 minutes and 2nd connection = 30 messages in 5mins."

     

     

     

  • Deb_Allen_18's avatar
    Deb_Allen_18
    Historic F5 Account
    Thanks for posting back, I'm sure the extra info will come in handy for someone out there.

     

     

    :-)

     

     

    /deb
  • Interesting! So in a competitor's SLB solution, we had to compensate for what they called their tcp Flow Timeout which was significantly lower than the 5 minutes that you discovered here.

     

     

    Thanks,

     

     

    CarlB
  • Hi Deb,

     

     

    You mentioned above that it is possible to LB multpile requests in a TCP stream? Dou you have any best practice solution for this?

     

     

    Thanks,

     

     

    "Meant also to say that LB'ing multiple requests in a TCP stream to multiple servers is possible, but it works well in only a very few cases, since LTM can't pipeline requests within such an iRule -- LTM must wait for the response to request 1 before sending request 2 to a different server, or response 1 will be lost.

     

     

    /deb

     

    "
  • Just FYI, the idle timeout is fully configurable, client and server side.

     

     

    Also, in the scenario described above, there is an affinity between the web and queue servers, created by persistence (if used) and the long session timeout (I'm assume the connection stays open longer if in use). Four new web connections would create a four single connections, one from each web server to one queue server. The load on the queue server is determined by how many transactions are posted on the single related web server. I doubt a lower timeout on the F5 would help much here.