Forum Discussion

Abed_AL-R's avatar
Abed_AL-R
Icon for Cirrostratus rankCirrostratus
Apr 01, 2019

GTM - iRule for persistence

Hi to all

I'm facing a problem in GTM where we configured a wide-IP with two pool members.

Those pool members are separated APM modules in separated data centers.

End-users connecting to one APM module and start to work . inside the APM webtop there are some links enforcing the end-user to query again the same wide-ip . and here where the problem is start.

We've configured in GTM persistence CIDR /32 . But most of the end-user are using DNS servers that is using many LDNS servers , such as Google 8.8.8.8 listener which has a lot of LDNS servers all over the world. So, That 8.8.8.8 listener is not the one is querying the GTM , but the LDNS servers behind it:

172.217.40.8    wideip:A:/Common/mydomain.com-> pool-member:/Common/DC1:SSL_Portal     04-01 11:30:33
173.194.98.10   wideip:A:/Common/mydomain.com-> pool-member:/Common/DC2:SSL_Portal     04-01 11:33:09
173.194.98.9    wideip:A:/Common/mydomain.com-> pool-member:/Common/DC1:SSL_Portal     04-01 11:33:13
74.125.47.15    wideip:A:/Common/mydomain.com-> pool-member:/Common/DC2:SSL_Portal     04-01 11:35:50

From the following article it looks like Google Public DNS server ip addresses 8.8.8.8 and 8.8.4.4 are mapped to the nearest operational server by anycast routing.

https://developers.google.com/speed/public-dns/faq

When clients send queries to Google Public DNS, they are routed to the nearest location advertising the anycast address used (8.8.8.8, 8.8.4.4, or one of the IPv6 addresses in 2001:4860:4860::).
The specific locations advertising these anycast addresses change due to network conditions and traffic load, and include nearly all of the Core data centers and Edge Points of Presence (PoPs) in the Google Edge Network.

And this is a big problem to us. Because if end-user is querying Google DNS 8.8.8.8 , he/she might be get transferred between data centers.

There was some suggestions to solve this but none of them can totally resolve the issue :

  • To minimize the CIDR to /8 . But again , Google LDNSs , as you can see in the above example , does not reside on one specific CIDR.

  • To upgrade to v14 and use ECS feature . But then we need to make sure that each LDNS (DNS provider) which is forwarding DNS queries to the DNS BIG-IP system is using ECS feature.

I opened a case to F5, but seems to me nothing could resolve the issue but iRule. And I'm not sure which iRule ..

Has anyone came across this issue and solved it permanently ?

Thanks!

4 Replies

  • Hi,

     

    I would say that no iRule would solve your issue and that's because its the DNS entry on the client machine that timeouts out after 30s (by default) which make the client send another DNS request to resolve the same name. The ECS is, for me, the only solution to your problem.

     

    You can increase the TTL value for the DNS response and hope that the client make the next request within the timeout. However, the more you increase that value the less you benefit from inteligeante dns ...

     

    Keep me in touch if you find a better solution,

     

    Many thanks,

     

    Karim

     

  • Hi

     

    F5 team suggested to work in "Topology" load balancing method.

     

    So Topology method will split (by Geolocation) between Source DNS servers from our country code and those from other country codes.

     

    Maybe also minimizing the CIDR persistence to /16 .

     

    I think this is the best thing we can do. It will not load balance equally between two data centers , but hey , if it solves the problem :)

     

    Thanks!

     

  • Aha, I see. So this solution works as long as the location advertising the anycast address doesn't jump from country to country for specific client ... Many thanks for your reply.

     

    Karim

     

  • Haven't test it yet. We well soon.

     

    it is not a 100% solution .. and we're aware of that

     

    But this is the only suggestion we get from F5 support ..

     

    Any other suggestion for a 100% solution to this problem is more than welcome