One year ago an online dating site called Lovely-Faces.com was lunched with over 250,000 profiles. These profiles were scraped from Facebook without the permission of the users. This incident illustrates exactly what web scraping is all about.

Another good example when a web scraping attack may occur is when a web application contains cataloged content, for example, electronic equipment with a catalog number and the price for each item. Let’s say that competitors would like to know the price of each item in the catalog in order to sell the same products for one dollar cheaper (because customers buy the product with the lowest price).

Defending against a “Web scraping” attack (also known as “Web harvesting”) is very challenging. In most cases the motivation behind this attack is business driven, and the attacker tries to steal web application content that is publicly available without the approval of the content owner.

This attack is different from other well-known attacks since:

1. In this case the information that is stolen is not sensitive; it is presented to all web application users.

2. Access to this information from the same user more than once is permitted (because customers may want to browse the web application before choosing the item they want).

These characteristics make protection your web application from this attack more of a non-deterministic kind of protection. Blocking suspicious users is not recommended since losing business as a result of false detection is not acceptable; therefore, another more sophisticated approach is needed here. We want to delay the attacker to the point where his harvesting will become ineffective, while still keeping the web application available to all users. In order to do that and to make sure that the “attacker” is a legitimate customer using his browser, we have to be intrusive and inject JavaScript into the web application response.

F5 BIG-IP Application Security Manager has a unique protection for this type of attack. By injecting JavaScript to suspicious users we can perform the following checks:

i. Is it a real user? Make sure JavaScript was executed on the customer’s browser.

ii. Is it a scraper? Delay the user’s next request to the extent of not making it noticeable to the genuine user while making the attack slow and therefore ineffective for the harvester.

Wheat_harvest