Spam

Firstly, I must acknowledge and thank my friend Russell for putting this page together. This page (his really - I just borrowed it) documents some of the things that I have been doing to combat SPAM in my inbox. Hopefully some of these things become useful to other people.

My email signature

You're probably visiting this page for the first time if you got an email from me, and wondered what that cryptic thing at the bottom of it was. Allow me to explain... (if you're looking for cunning anti-spam measures, scroll down a bit)

echo http://matthew.jesuits.net/spam/ | sed 's,t/.*,t,;P;s,.*//,,;s,\.,@,;'

This line of near-gibberish is a bash (a type of Unix command line shell) command. It will probably work in a host of other shells, including the most basic sh. First, a bit about Unix "pipes". That bar that you see near the middle there (just before the sed) is called a pipe, and essentially connects the two programmes together. The output of the first command becomes the input of the second. Let's break it down a bit.

echo http://matthew.jesuits.net/spam/

echo is described in its (GNU) manual page like so: echo - display a line of text (the FreeBSD manual page says echo -- write arguments to the standard output). Yup, that is all it does... puts text onto the "standard output" (normally, the standard output for a programme is the terminal that you are typing the commands into, and "arguments" are the bits on the command line after the command name, in this case "http://matthew.jesuits.net/spam/" is the only argument to the echo command). Executing the above command would produce the output:

http://matthew.jesuits.net/spam/

echo is useful for creating some basic input for other programmes. In this example, we create some input for the sed programme, described in the FreeBSD man page as sed -- stream editor (the GNU sed man page has a useless description). echo's standard output is connected with sed's standard input by the shell, because of the pipe we put in there.

OK, so, now comes the hard bit... what does that gibberish after the sed do?

sed 's,t/.*,t,;P;s,.*//,,;s,\.,@,;'

sed edits streams. Anything flowing through pipes can be thought of as a stream. sed takes as its main argument a list of instructions. Can you spot the semi-colons in there? Those are the breaks between the instructions. OK, so lets break this down into a more readable form:

s,t/.*,t,;

P;

s,.*//,,;

s,\.,@,;

The first letter of the line signifies the sed instruction that we are going to use. In this case, they are all the "s" ("substitute") instruction, except for one, the "P" (print) instruction. The manual has this to say about the format of the s instruction:

s/regular expression/replacement/flags

A regular expression is just a set of characters which "match" another set of characters. I'll get to that just now. As you can see, I'm using a comma instead of a slash after the "s" (and at all the other points where a slash is needed). Almost any character can be used... I chose a comma because it was not used anywhere else in the regular expression or the replacement.

So, lets examine the three regular expressions, and their replacements, one by one...

s,t/.*,t,;

Here, the regular expression is t/.*. A dot in a regular expression has a special meaning. It means "match any character". The asterisk after it means "zero or more of the preceding thing". So, what this regular expression means is "a t, followed by a slash, followed by zero or more of any character. This will match the t/spam/ at the end of the URL. The replacement (thats the part between the second and the third commas) is "t". So, replace the t/spam/ with t, leaving us with http://matthew.jesuits.net

At this point, the P instruction is used, which prints out what is remaining in "pattern space". ie, print out my web site address, http://matthew.jesuits.net

s,.*//,,;

Here, we replace zero or more of anything followed by two slashes with nothing (there is nothing between the second and third commas). This leaves us with matthew.jesuits.net

s,\.,@,;

This last instruction is very simple. As I said just now, the "." matches any character. We can "escape" this, to match only a "." by preceding it with a backslash (\). So, the regular expression says "match a dot", and the instruction says "replace a dot with an at sign". sed only matches the first instance of the regular expression in the string. In other words, only the first dot will be replaced with an at sign, leaving my email address. This is what remains in "pattern space", and is automatically printed out by sed. Voila!

If you've followed this far, and aren't confused, I get to be proud of myself :) And, since you've just decoded my web site address and my email address (from the mail that you got from me :P), drop me a line, and let me know what you think of this explanation. Feedback good. (And I promise to pass it onto Russell who deserves all the credit for this in the first place!)

Anti-Spam measures

This section is very much work-in-progress... I'll get there eventually.

I employ a few anti-spam measures both on my website, and in my communications with people, and, of course have a spam-filter protecting my inbox.

Never publishing email address

I never publish my email addresses, where possible.

Makes use of Google's Spam functions

Google offers great detection and filtering of Spam messages even before they reach my inbox.

SpamAssassin

SpamAssassin is a mail filter which helps to identify spam that gets into my mail box on my computer.