Stopping Auto Form Submissions
Having a site like ScriptSchool.com I often receive questions asking how to do different things using coding. Here is a question I’ve seen asked a few times which I’ve found interesting:
“I’m getting hammered by auto submissions! Is there anything I can do to stop these auto submitters?”
Many webmasters run link lists, tgps, etc.Having a site like ScriptSchool.com I often receive questions asking how to do different things using coding. Here is a question I’ve seen asked a few times which I’ve found interesting:
“I’m getting hammered by auto submissions! Is there anything I can do to stop these auto submitters?”
Many webmasters run link lists, tgps, etc. and do not want to get ransacked by auto submitting programs. Also they have link trades with webmasters and those link trades go unnoticed when someone uses an auto submission bot to bypass their existing program.
The solution to any problem first starts with understanding how the problem is created. In this case, how the auto submission program functions.
In some cases you can get ahold of auto submission programs and review the source code and in others you might have to reengineer them somehow. I don’t suggest, recommend, endorse, nor promote decompiling copyrighted code, so don’t get the wrong idea here. However, I will say that in the case of a compiled program where the intention either directly or indirectly is to bypass form and/or other security on my website, I consider that a hostile action and feel completely within my right to take reasonable steps to protect my online business.
The majority of auto submission program developers will boast “auto submissions save webmasters time”, but consider who’s making money from selling the auto submission application, who’s shotgunning submissions “rules be damned”, and who is spending excess time having to weed through submissions that were mechanically generated. It’s been a hotly debated topic on various boards and my focus in this article isn’t on the ethics of using these programs but technically how to stop these programs. I should also be careful to point out that not all auto submission developers are misguided in my opinion. There are some very legitimate uses of auto submission programs but it is not only my opinion but a well documented fact that the majority of these programs are abused — and some horribly abused — by webmasters.
For those who actually want to receive auto submissions — or receive them from certain trusted webmasters, I personally think that’s a good idea to save both parties time 😉 Just create a secondary submit script and special bypass password and give that info to your webmaster friend who uses the auto submission program.
But for those who don’t want these mechanical submission, read on…
There are three types of bots that I have seen and studied in various degrees of depth. I am not drawing any technical names for these bots, so the names that follow in this article are of my own creation (at least as far as I know). I also will not draw much distinction between agents and bots in this column, but there are certainly very technical distinctions between the two. I am also probably misusing in some technical way the term “bot” so the bot purists in the audience please do not send your bots to egg my door. I am a programmer and not a bot expert, nor do I want to be a bot expert, but I understand what these bots (or whatever you want to call them) are being programmed to do. And understanding what they are doing is the first step in beating them at their own game.
Bye Bye Simple Bots
The first type I classify as a Simple Bot. It is usually the design of a novice programmer and done in a higher language like PHP, Perl, ASP, Cold Fusion, etc. It doesn’t use any low level winsocks or anything to spoof the referring URL. It might even be done with library socket connect code like Perl LWP library or standard php functions. This type of bot can often be easily defeated by a simple check of the HTTP_REFERER environment variable. PHP code that looks something like this will dispatch this type of bot in most circumstances:
if($HTTP_REFERER != “http://www.mydomain.com/myformpage.html”) {
// redirect to thank you for submission page and ignore submission
header(“Location: http://www.mydomain.com/thankyou.html”);
}
?>
What would happen to the Simple bot is it would show up and submit the form like it always would and when the check was made to see where it was submitted from fails, it would be redirected to the thank you page — thus ignoring the auto submission.
Mid-Level and Low-Level Bots
The second type of bot is more sophisticated and I coin this type of bot the Mid-Level Bot. It is often created by a programmer familiar with spoofing techniques and will often employ the use of an environment variable spoof to make the form appear to the CGI program that it is actually being submitted by a human being from the form location on the website, when in fact it really isn’t. Most of these particular bots usually do not include any type of intelligent form spidering, which reports back to the mother ship when form input fields and/or hidden input values have been changed, but in some cases do. Some of these bots will actually behave like browsers and even allow cookies to be set! They can become a real nuisance to deal with. Personally, I prefer to deal with these bots the same way as dealing with low-level bots, but I have seen and employed techniques which included the following:
1) daily movement/renaming of the form submitting page and submission script
2) encryption schemes involving multiple checksums in the form (number of fields matches, random dummy input fields, etc.)
3) validation passwords to activate forms
Still with me? Now is where it starts getting really interesting…
The Low-Level Bot is the most intricate bot you can and will encounter today. These bots are often programs running under proxy servers which mimic the behavior of real users. Some of these use macros to simulate pauses and “natural” keyboard strokes. Full spoofing is in force so forget trying to ban them by IP or by simplistic HTTP_REFERER check. Also if you move the location of the form, many of these bots will spider links elsewhere on the site to seek out these new forms and intelligently report back the form input changes. Some of them do this so well and so fast that taking down and moving your forms is a hopeless cause. In a weird sort of way it can end up with a computer chess match. Your algorithm checks for certain conditions and actions, the auto submission bot checks and responds, and the end result is somebody’s program will be checkmated. More often than not the form script you are using will be the loser because it lacks the sophistication that can be created in a lower level program.
So how can you stop these Low-Level Bots?
If you have been to some larger mainstream sites you’ve probably already seen a recent trend in typing in a validation code. This code is not emailed to the user and is unique each time you visit the form. The validation code is system generated but is not returned in the form of a hidden input, regular text or normal input field info, it is returned in the form of a dynamically generated image.
Since the bot cannot understand how the text inside the image reads unless it seeks through the binary separately (an advanced endeavor) — only that it contains an html IMG tag (and therefore should be ignored), it doesn’t emulate the validation code, and thus you can safely weed out submissions with the wrong and/or missing validation code and assume that these submissions are from either one of two possibilities:
1) user validation input error
2) auto submission bot
Number One is an obvious choice if the validation code is made to be too long and/or complicated, so my recommendation is keeping it a combination of 4 letters and numbers randomly mixed. Again, care has to be taken in designing a random enough algorithm so that the auto submission bot creator cannot figure out this pattern, yet at the same time make it something easy for the user to type in.
Computers do not generate true random numbers, but if you keep the seed long enough, it can be very, very difficult to figure out the pattern. You can make it additionally difficult to crack by changing the number of digits at random intervals. If you employ a strategy like this, the auto submission program developer will go in search of much easier form prey. There will always be some master hacker that delights in figuring out complicated algorithms, but by employing this dynamic image technique you’ll immediately cut 99.9% of auto submission programs on the market off at the knees.
The best defense is a good offense!
While I’m not going to show you every piece of code here, I am going to show you enough so that your in-house programmer will be able to run with the solution. If you don’t have an in-house programmer, or desire a custom solution, then you can contact me and for a reasonable fee, I’ll implement a solution to prevent auto submissions on your website(s).
Here are the server side ingredients you’ll need:
1) PHP version 4.0+
2) gd library version 1.8+
3) mcrypt (OPTIONAL)
Steps to take:
1) Write a script which will generate a random 4 digit letters/numbers mixed string.
2) Create a dynamic image with a random color/text scheme (to change the binary) and generate the image with the string inserted inside this image. Keep in mind that the text must be readable against the background color, so don’t go too crazy with the colors.
3) Write this validation code into a session variable. Do not use cookies! Cookies have been proven to be exploitable in a recent version of Internet Explorer (get the patch, btw). For additional security you can use mcrypt and blowfish or another good encryption algorithm (other algorithms include but aren’t limited to: CAST, RC2, RC4, RC6, RIJNDAEL, SAFERPLUS, SAFER, 3DES, 3WAY, SERPENT, TWOFISH, XTEA).
4) Upon submitting of the form compare the entered in form validation value to the session variable (which matches the dynamically generated image). If they do not match then fail the form. I would recommend after 2 failed attempts to destroy the session and reissue a new image. You might also optionally log any mismatch for suspect activity and for the potential of dealing with sophisticated (or not so sophisticated) brute force attacks.
I hope this gives you ideas in making your form input more secure, and ensuring that the input is coming from a live surfer or webmaster and not some bot running on the desktop.
TDavid is co-owner, programmer and webmaster for several sites devoted to programming including his own http://www.tdscripts.com/. He has done custom programming in various programming languages for companies all over the world. Every Friday at 2pm PST you can catch his weekly radio show dedicated to the technical side of webmastering and programming at http://www.scriptschool.com/radio.