Mitigate Unplanned Scale Issues with an iRule Waiting Room
This solution originates on Tuncay Sahin’s website and is shared with minor updates with permission here. F5er Tim Wagner reached out after hearing news of the flood of unemployment claims causing site crashes to see if we’d take a look at this iRule. The iRule as-is performs the following functions:
- Tracks connections by IP address using the table command (session table accessor)
- In the event the visitor count exceeds the max visitors, send a service unavailable error (503) with a meta refresh at a 60 second interval
- Upon retry, if the visitor count slips below the max visitors, forward to the server, otherwise, send the 503 again.
- For explicit URIs, there is a stats or bypass mechanism
Note that it works as is as long as you add the appropriate data-group. Before getting into the small updates I chose to make, consider this solution to be a fancy sorry page. You can find several solutions on that front in the codeshare. The updates I made are simple:
- Eliminate the management tool table and configuration to reduce complexity. I keep the status page details for troubleshooting (/getcount) however.
- Rename all the VIP stuff to BYPASS to improve clarity. VIP has a special meaning for BIG-IP, keeping it would be confusing
- Remove the embedded images and html pages and use iFiles instead to increase readability. I kept most of the variables for this reason as well, though in a highly performant situation I’d provide a clarity map in comments and seek to optimize for performance instead.
- Added a js canvas drawing to reduce boredom in the waiting room. This is also served as an iFile.
The Solution
I left some optimizations on the floor as an exercise for the reader, let me know how you would further tweak this iRule!
when HTTP_REQUEST { ## Check if host name match otherwise exit. ## This is needed if you have multiple websites running on same Virtual Server if {[HTTP::host] eq "test.test.local"} { # waiting room js file, only necessary if you want a canvas animation if { [HTTP::uri] eq "/bb.js"} { HTTP::respond 200 content [ifile get bb.js] TCP::close return } ## Set variable #Your website (unique)shortcode, needed to divide multiple online waiting room iRules on same Virtual Server. #In this example the shortcode is SITE1 set OWR SITE1 # Max visitor count # How many concurrent visitors can you serve set max_visitors 2 # Timeout in seconds # IdleTimeout value is based your cart ideltimeout value. Must at least be equal to your cart IdleTimeout value. set IdleTimeout 60 set WaitingRoomTimeout 60 # Decide vistors IP address. Visitors behind a proxy are seen for one visitor. if { ([HTTP::header exists "True-Client-IP"]) and ([HTTP::header "True-Client-IP"] != "") } { set Client_IP [HTTP::header "True-Client-IP"] } else { set Client_IP [IP::client_addr] } # Defining Tables set VisitorsTable VisitorsTable-$OWR-$max_visitors set WaitingRoomTable WaitingRoom-$OWR-$max_visitors # Generic set unique_id [format "%08d" [expr { int(100000000000 * rand()) }]] set request_uri [HTTP::host][HTTP::uri] set BYPASS $OWR-bypass-url-list # Counters set VisitorCount [table keys -subtable $VisitorsTable -count] set WaitingRoomCount [table keys -subtable $WaitingRoomTable -count] set TotalVisitors [expr {$VisitorCount + $WaitingRoomCount}] ## End Variable ## Monitoring # Allow monitoring from internal IP's or subnets. if { ($Client_IP starts_with "192.168.102") && ([HTTP::uri] equals "/getcount") } { HTTP::respond 200 content "Total Visitors: \[$TotalVisitors\] Max Visitors: \[$max_visitors\] Waiting Room Count: \[$WaitingRoomCount\]" TCP::close return } ## Start WaitingRoom # Check if the visitor session still exists set VisitorSession [table lookup -subtable $VisitorsTable $Client_IP] if { $VisitorSession != "" } { # We have a valid session... The lookup has reset the timer on it so just finish processing } else { # No valid session... # Check if BYPASS URL set bypass_url [class match -value [HTTP::uri] contains $BYPASS] if { not ($bypass_url == "") } { # BYPASS, do nothing } else { # NOT BYPASS, Check connection count for displaying WR Page # So do we have a free 'slot'? if {$VisitorCount < $max_visitors} { # Yes we have a free slot... Allocate it.. # Register visitor table add -subtable $VisitorsTable $Client_IP $unique_id $IdleTimeout } else { # Max visitors limit reached, show WaitingRoom # Insert visitor into WaitingRoomTable table add -subtable $WaitingRoomTable $Client_IP $unique_id $WaitingRoomTimeout # Show waiting Room HTML HTTP::respond 503 content [ifile get waitingroom.html] } TCP::close } } } }
The Result
I set the max visitors and the idle timeout to ridiculously low values (2 and 60, respectively) to make it easy to test. I ran siege from from two linux virtual machines so I could test my desktop browser and sure enough, my third connection resulted in the waiting room:
I also tested the /getcount URI to see what my stats were:
Total Visitors: [3] Max Visitors: [2] Waiting Room Count: [1]
And finally, I tested the bypass, using a valid path from the data-group to bypass the waiting room. A query to /bypass_test resulted in a 404 (as that URI is dead link currently) instead of the 503 I should get on all non-bypass URIs, so this was successful as well.
What solutions are you looking at for handling temporary scale issues? Cloud bursting? Dynamic growth and shrinkage in kubernetes deployments? Let me know how you are handing these situations in the comments below. And a hearty shout out again to Tuncay Sahin for the original work here!