Firstly, I must acknowledge and thank my friend Russell for putting this page together. This page (his really - I just borrowed it) documents some of the things that I have been doing to combat
SPAM in my inbox. Hopefully some of these things become useful to other
people.
My email signature
You're probably visiting this page for the first time if you got an email
from me, and wondered what that cryptic thing at the bottom of it was. Allow
me to explain... (if you're looking for cunning anti-spam measures, scroll
down a bit)
echo http://matthew.jesuits.net/spam/ | sed 's,t/.*,t,;P;s,.*//,,;s,\.,@,;'
This line of near-gibberish is a bash (a type of Unix command line shell)
command. It will probably work in a host of other shells, including the most
basic sh
. First, a bit about Unix "pipes". That bar
that you see near the middle there (just before the sed
) is
called a pipe, and essentially connects the two programmes together. The
output of the first command becomes the input of the second. Let's break it
down a bit.
echo http://
matthew.jesuits.
net/spam/
echo
is described in its (GNU) manual page like so: echo -
display a line of text
(the FreeBSD manual page says echo -- write
arguments to the standard output
). Yup, that is all it does... puts text
onto the "standard output" (normally, the standard output for a
programme is the terminal that you are typing the commands into, and
"arguments" are the bits on the command line after the command
name, in this case "http://matthew.jesuits.net/spam/" is the only
argument to the echo command). Executing the above command would produce the
output:
http://
matthew.jesuits.
net/spam/
echo
is
useful for creating some basic input for other programmes. In this example,
we create some input for the sed
programme, described in the
FreeBSD man page as sed -- stream editor
(the GNU sed man page has a
useless description). echo
's standard output is connected with
sed
's standard input by the shell, because of the pipe we put
in there.
OK, so, now comes the hard bit... what does that gibberish after the
sed
do?
sed 's,t/.*,t,;P;s,.*//,,;s,\.,@,;'
sed
edits streams. Anything flowing through pipes can be
thought of as a stream. sed
takes as its main argument a list
of instructions. Can you spot the semi-colons in there? Those are the breaks
between the instructions. OK, so lets break this down into a more readable
form:
s,t/.*,t,;
P;
s,.*//,,;
s,\.,@,;
The first letter of the line signifies the sed
instruction
that we are going to use. In this case, they are all the "s"
("substitute") instruction, except for one, the "P"
(print) instruction. The manual has this to say about the format of the s
instruction:
s/regular expression/replacement/flags
A regular expression is just a set of characters which "match" another
set of characters. I'll get to that just now. As you can see, I'm using a
comma instead of a slash after the "s" (and at all the other points where a
slash is needed). Almost any character can be used...
I chose a comma because it was not used anywhere else in the regular
expression or the replacement.
So, lets examine the three regular expressions, and their replacements,
one by one...
s,t/.*,t,;
Here, the regular expression is t/.*
. A dot in a
regular expression has a special meaning. It means "match any
character". The asterisk after it means "zero or more of the
preceding thing". So, what this regular expression means is "a t,
followed by a slash, followed by zero or more of any character. This will match the
t/spam/
at the end of the URL. The replacement (thats the
part between the second and the third commas) is "t". So, replace the
t/spam/
with t
, leaving us with
http://
matthew.jesuits.
net
At this point, the P
instruction is used, which prints out
what is remaining in "pattern space". ie, print out my web site
address, http://matthew.jesuits.
net
s,.*//,,;
Here, we replace zero or more of anything followed by two slashes with
nothing (there is nothing between the second and third commas). This leaves
us with matthew.jesuits.
net
s,\.,@,;
This last instruction is very simple. As I said just now, the "." matches
any character. We can "escape" this, to match only a "." by preceding it
with a backslash (\). So, the regular expression says "match a dot", and the
instruction says "replace a dot with an at sign". sed
only matches
the first instance of the regular expression in the string. In other words,
only the first dot will be replaced with an at sign, leaving my email
address. This is what remains in "pattern space", and is
automatically printed out by sed
. Voila!
If you've followed this far, and aren't confused, I get to be proud of
myself :) And, since you've just decoded my web site address and my email
address (from the mail that you got from me :P), drop me a line, and let me
know what you think of this explanation. Feedback good. (And I promise to pass it onto Russell who deserves all the credit for this in the first place!)
Anti-Spam measures
This section is very much work-in-progress... I'll get there eventually.
I employ a few anti-spam measures both on my website, and in my
communications with people, and, of course have a spam-filter protecting my
inbox.
- Never publishing email address
- I never publish my email addresses, where possible.
- Makes use of Google's Spam functions
- Google offers great detection and filtering of Spam messages even before they reach my inbox.
- SpamAssassin
- SpamAssassin is a mail filter
which helps to identify spam that gets into my mail box on my computer.