5 web development techniques to prevent Google from crawling your HTML forms

Google has recently decided to let it’s Googlebot crawl through forms in an effort to index the “Deep Web”. There are numerous stories about wayward crawlers deleting and changing content through submitting forms, and it’s about to get worse. Googlebot is about to start submitting forms in an effort to get to your website’s deeper data. So what’s a web developer to do?

1. Use GET and POST requests correctly
Use GET requests in forms to look up information, use POST requests to make changes. Google will only be crawling forms via GET requests, so following this “Best Practice” for forms is vital.

2. Make sure your POST forms do not respond to GET requests
It sounds so simple, but many sites are being exploited for XSS (Cross Site Scripting) vulnerabilities because they respond (and return HTML) to both GET and POST requests. Be sure to check your form input carefully on the backend, and for heaven’s sake – do not use globals!

3. Use robots.txt to keep robots OUT
robots.txt file keeps Googlebot out of where it doesn’t belong. Luckily, Googlebot will continue it’s excellent support of robots.txt directives when it goes crawling through forms. Be sure not to accidentally restrict your website too much, however. Keep the directives simple, excluding by directory if possible. And test, test, test in Google’s Webmaster Tools!

4. Use robots metatag directives
Using the robots metatag directives for more refined control. We recommend “nofollow” and “noindex” directives for both the form submission page and search results pages you want Google to stay out of, even though Google says disallowing the form submission page is enough. Consider using tags and category pages that are Google friendly instead.

5. Use a CAPTCHA where possible
Googlebot isn’t going to fill out a CAPTCHA, so it’s an easy way to make sure some bot isn’t filling out your form.

Googlebot is, of course, the nicest bot you can hope to have visit your website. This provides a chance to secure forms and take necessary precautions before other – not so polite – bots visit your forms.