Search Engine Optimization

Unlike most search engine optimization that goes on, I am trying to dissuade search engines from coming to staging sites hosting on WP Stagecoach. Prior to today, I just made a change in all staging sites that made WordPress discourage them, but many search bots, especially the less reputable ones, don’t tend to listen very well. So today I implemented a change that will make Apache return a 403 (Forbidden) error to search engines that use a User-Agent name that contains:

  • bot
  • crawler
  • spider

(as well as a couple others).

I did this using Apache’s ability to set an environment variable with mod_setenvif, and the “Deny” directive.
I put the following in apache’s main config, httpd.conf:
LoadModule setenvif_module modules/
SetEnvIfNoCase User-Agent "bot" search_bots
SetEnvIfNoCase User-Agent "crawler" search_bots
SetEnvIfNoCase User-Agent "spider" search_bots

and then in each of Apache’s vhosts, I already have the Directory set, so I just included the deny at the end:

Order allow,deny
Allow from all
Deny from env=search_bots

That should keep the staging sites off of the major search engines, but not stop anyone from actually visiting and using their staging site!