BLOG

Incoherent ramblings, devlogs, edgy philosophy and other topics of interest

ENTRY #96

---



If you've ever managed any kind of online service you're well aware of how easy it is to abuse unprotected forms. A simple python script can wreak havoc by bombarding a server with randomized input that has to be carefully sanitized to prevent injections. Said processing, however, is typically the most computationally expensive request for a simple website to honor due to the complexity of the steps involved; sanitizing the user input, validating the cleansed data, writing it to a database, etc. Thus, unless we know for certain that the form we received was legitimately answered it is best left discarded, lest we neglect gate-keeping the database only to find it littered with nonsense or worse.

pocket_php captchas samples
The previously mentioned script that fills the form with randomized characters wouldn't be too difficult to detect and mitigate, but a slightly improved version that generates a well formatted email address (regardless of it's authenticity)? Not so much. Even potential solutions, complicated as they mey get, would most likely only work when validating the email field, not the rest of the form which would probably require their own specialized routines. The resulting increase in resource consumption doesn't even ensure the processed form was legit in the first place, only that it passed the aforementioned filters.

For the sake of brevity, now that the problem has been illustrated I'll jump to the point. This is no easy problem to solve, but there are simple and effective precautions that universally apply to internet forms that will at least help deter automated injections, namely captchas.
The idea behind them is quite clever, exploit the fact that there are easy to generate problems that a computer still can't crack but a human can solve in an instant. With character recognition tests being among the more popular.

As for the inevitable cost, captchas may be straightforward but are certainly not free, at least in contrast with leaving a form unprotected, and in cases where the server is working at or near capacity having to generate captchas would definitely worsen the service's responsiveness. On the opposite scenario where the server is practically idling it's quite likely that working the captchas becomes the most expensive step of the form's validation anyway. Alternatively, it's common practice to outsource captchas to reduce the local workload at the expense of the user's privacy.
whichever you choose, safeguarding forms automated attacks has a price, but it is practically ALWAYS WORTH PAYING.

The captcha functionality added to the pocket_php example login page is very simple and effective. It randomizes a set amount of characters from a given input string, draws randomly generated squares to the randomly colored background to obfuscate the foreground, renders the selected characters at a random angle, position and color (within reason, it'd be redundant to make this hard to answer for humans) and finally, the generated string is stored in a PHP session variable to subsequently validate the client's answer. Should the created captcha be too difficult to read for a human, all the client has to do is ask for a new one either by pressing the refresh captcha button or by reloading the form. Ezpz.

Note that the internal login captcha can be (de)activated by setting the ENFORCE_LOGIN_CAPTCHA in app/configure.php and utilizes the php-gd library.

Local, private, effective captcha

Before ending the post I'd like to clarify why captchas were added while other utilities get overlooked from the project. Pocket_php is currently powering fourteen web services with more planned, working with these individual projects I often get tempted to add a particularly handy snippet into the pocket_php codebase for future use, only to scrap the idea in favor of it's original vision; to provide the fastest, simplest template for webprojects to use as foundation. Abiding by this rule means that certain kinds of utilities are often not incorporated into source due to them being either too situational or just not worth the effort to implement on behalf of the programmer.

What separates captchas from other potential features is how necessary they've become and how widespread the use of (often subversive) third-party captcha services has grown in response, specially among sites that don't really benefit from such an approach at all. In the end, it's not about how easy or fast it is implement captchas, it's about having the option of privacy at the same reach as the alternatives.