Forum Discussion

swo0sh_gt_13163's avatar
swo0sh_gt_13163
Icon for Altostratus rankAltostratus
Dec 24, 2014

Uneven load balancing - Least connection (member)

Hello Folks,

 

I hope you are enjoying Christmas eve, and sorry to post tech questions on the occasion of biggest festival.

 

I need your input on a case I've got.

 

Scenario: 1 HTTP VS using Cookie persistence, with 2 pool members using Least Connection(Member) algorithm are not processing traffic equally. The difference is quite high in statistics of both of the pool members.

 

The current version they are using is 11.2.1 HF12. Following is the screenshot of unequal load balancing appearing in the statistics of both of the pool members.

 

 

Any one has any sort of experience of such instance? Any comment or suggestion?

 

Thank you, Mary Christmas!

 

Darshan

 

10 Replies

  • Hello Darshan,

     

    The statistics which you see here is not one connection per IP. These are HTTP connection which being teared and reopened during even an ongoing session. So based on some user doing some activity you can see more sessions. Also if you are using oneconnect profile you can see some discrepancies. However if you look at the Bits and packets they are properly distributed. So I don't think you need to worry about anything.

     

  • Hey Pratik,

     

    Thanks for the reply mate. Isn't it a wast difference between transferred Bits between both of the hosts?

     

    .117 Member shows - 2.8 Gig of Received traffic.

     

    .125 Member shows - 67.8 Meg of Received traffic.

     

    Also application team observed that .117 server is receiving most of the traffic compare to .125 server.

     

    Cheers! Darshan

     

  • That's my bad, I didn't notice that. Do you have oneconnect profile configured on virtual server ?

     

  • To achieve true balancing, you'll need to get rid of persistency - talk to your DEVs. To mitigate the uneven balancing problem, perhaps "Least Sessions" balancing method would work better for you? With "Least Sessions" method, expect to achieve ~60/40 balancing ratio during the peak hours. When there is very little activity, you can expect ~80/20 balancing, and that's normal.

     

    Another mitigation tip: If you have a persistency profile configured so that the persistency cookie has an indefinite timeout, you should consider reducing it to 2 hours. Quite commonly, end-users don't close their web-browser as they go home from work, and this could negatively impact your balancing ratios of internal-use applications.

     

  • I have four members in a pool, source address persistence 20 minutes and least-connection-member. Out of four only one has all connection. Checked persistent record and connection table for my ip none for that vip.

     

    I send request to the vip, the request should be load balanced to the member having least number of connection. But still my request is sent to the member that has higher number of connection.

     

    Is this some kind of bug on the code. 11.4.1 HF8.

     

  • Hello Folks,

    Sorry I couldn't update the thread. I have actually escalated to F5 Support, and run of the case was almost 3 months, and it was escalated to PD etc, and finally it end up with the bug, with existing firmware version.

    Following was resultant update from Support.

    PD has found the problem.  
    They confirmed that it's indeed caused by OneConnect and we have a potential fix under review.  
    Disabling OneConnect or switching to round-robin are effective workarounds.   
    The customer can also upgrade to 11.5.0+ where this is no longer an issue.
    
    Hotfix-BIGIP-11.2.1-HF13-1306.15-ENG.iso has been uploaded to the dropbox, it contains the fix for **BZ504538**.  
    I tested it in the lab, and LC+OneConnect is working as expected.
    

    So basically, OneConnect and LC cannot work well together, following are low level technical details for the same.

    In regards to oneconnect+least-connections:
    
    1. The customer's issue is that they have uneven distribution of load balancing.  This is exactly what sol2055 describes. 
    
    2. Least-connections and oneconnect is a problematic combination because least-connections is trying look at currently used, open TCP sockets to the backend server.  Oneconnect is trying to minimize the open TCP sockets to the backend servers by re-using the sockets for multiple HTTP requests.
    
    3. [https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm-concepts-11-2-1/ltm_pools.html?sr=42995518](https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ltm-concepts-11-2-1/ltm_pools.html?sr=42995518)
    
    The Docs on least-connections say:
    
    "Note: If the OneConnect feature is enabled, the Least Connections methods do not include idle connections in the calculations when selecting a pool member or node. The Least Connections methods use only active connections in their calculations."
    
    That means that the idle tcp sockets being held open by one connect are not part of the least-connections calculation.  This is one reason for the discrepancy mentioned by sol2055.  Idle oneconnect connections will remain open and count as an open connections depending on which stats you look at ( and certainly these are open connections according to the web servers) but these connections will not be used by tmm for a least-connections calculation.
    
    4.  While looking at their connection table, the number of non-idle concurrent connections in their environment is not sufficient for least connections to even begin working properly.  Analyzing their connection table shows *one* connection open:
    
    Based on the above, I would suggest the customer utilize round robin instead of least-connections, especially if they want to keep using OneConnect.
    
    

    I hope this would be useful. Apologies to take such a long time to update the thread.

    Cheers! Darshan

  • Your number of connections is to short to gauge if there is an issue, you wont be able to gauge if load balancing is working from 15 connections only. The number of connections sent to each server looks ok.

     

    You should not use Bits In\Out as a measure for whether load balancing is being distributed evenly.

     

    Main thing though, you need to run the stats for longer.

     

  • Surgeon's avatar
    Surgeon
    Ret. Employee

    Your LB looks good. There is no issue with RR LB method. As mentioned before, LB decision based on number of connection