The Hacking Shtick
 
Spam Wars: Attack of the Clones
Spambots
     One of the most sinister ways spammers get their email addresses is through Spambots. Spambots are a variety of bot, or spider; programs that crawl through the web searching for information. They are often used by search engines to create large databases of websites. Spammers create Spambots in order to steal email addresses from legitimate websites.

Spam Traps
     The following webpages host little programs we at The Shtick like to call Spam Traps. They catch Spambots in an enormous loop which is designed to feed the Spambot as many fake email addresses as it can stomach. It's a beautiful thing. Create a link just like one of the ones below in order to make spamming a teensy bit more difficult.
A Gift for Our Robot Friends
<A HREF="http://www.deadlybrain.org/addresses.php">A Gift for Our Robot Friends</A>
Spam Wars: Attack of the Clones
<A HREF="http://www.smcox.com/programming/spam/clones.html">Spam Wars: Attack of the Clones</A>
     Download a free copy of Spam Wars as a .ZIP archive or as a self-extracting archive. (Updated 7/Feb/2007)

Some Precautions
     Of course, if a Spam Trap will trap a Spambot it'll trap a Googlebot, or any other bot that might be indexing your site for a search engine. However, this problem can easily be eliminated. Legitimate robots will respect the privacy of a site owner and refrain from indexing pages and following links where the site owner has specified that they not index or follow links.

     There are two ways to tell a bot what it is and is not allowed to index. First, there's the "ROBOTS" meta tag. Second, there's the robots.txt file.

     The robots meta tag is placed in the "HEAD" portion of your document and it tells a bot what it is and is not allowed to do with a page. The different ways that you can write a "ROBOTS" meta tag are as follows.
Code Behavior
<META NAME="ROBOTS" CONTENT="INDEX, FOLLOW"> Bot will index the page and will follow the links to find more pages to index.
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW"> Bot will index the page and will not follow any of the links.
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> Bot will not index the page and will follow the links to find more pages to index.
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> Bot will not index the page and will not follow any of the links.
     The first form of the tag is not needed as it is the default mode a bot will work in. The actual spam traps should use the fourth version where CONTENT="NOINDEX, NOFOLLOW". For contact pages we like to use the third version because any spambot which is smart enough to ignore the spamtrap in deference to this meta tag, will also ignore contact information. Of course, a legitimate bot will ignore the contact information too, so this extra step isn't always practical. (For example, online resumes. One would most certainly want them indexed.)

     The robots.txt file is a document which legitimate bots can look up to find out what they are and are not allowed to index. Go to Search Engine World to learn how to write a robots.txt file.

     As with the "ROBOTS" meta tag, we have our robots.txt file set to disallow indexing of contact pages and Spamtraps.

     Now go out there and protect yer selves.