Archive for December 2008
E-mail address obfuscation
If you’ve ever put your e-mail address on your site, you know how quickly spambots find it and start sending you great offers you can’t refuse. I had this problem and needed to come up with a good solution before beginning to develop client sites.
There are some libraries out there that provide e-mail address obfuscation, but the ones I found were not accessible in that there was no chance that a human would be able to read the e-mail address until after it had been decoded using JavaScript. Maybe a savant could decode it, but I don’t think that makes the site 503 compliant.
Instead, I needed something that would output a human-readable e-mail address that most spambots wouldn’t find. In this case, accessibility is most important, and obfuscation comes second. Because most spambots are lazy and only look for mailto: links or for e-mail addresses that match a regular expression, breaking the address apart would be enough. (I have no stats to back up the “most” qualifier in the previous sentence, but in practice I’ve found this to be true.)
A simple format that fits my two requirements is this: address [at] domain [dot] com. This is sufficiently human readable, and most spambots won’t notice it. For users with JavaScript enabled, it would be best to show a clickable link for address@domain.com.
To make the e-mail addresses easy for our code to find, each should be wrapped in a span tag marked with a special CSS class. For sake of example, I used “email”, but you can use something else if you want to be more discreet. So, an e-mail address should appear in the code as <span class=”email”>address [at] domain [dot] com</span>.
As a bonus, I also wanted it to handle addresses with a name. For instance, “Clark Kent” <clark [at] dailyplanet [dot] com> should become Clark Kent. To use a name as the link text, just add it in quotes before the address, and surround the address in < and > (properly escaped), such as <span class=”email”>”Clark Kent” <clark [at] dailyplanet [dot] com></span>.
In order to make the transformation, here’s what I did. (This code intentionally does not use any JavaScript libraries because I often use it on temporary pages that are posted while I’m building a site, so it’s pointless to require users to download all that extra code just so I can save myself a few lines).
function initEmailLinks() {
var spans = document.getElementsByTagName('span');
for (var i = 0; i < spans.length; i++) {
var span = spans[i];
if (span.className.indexOf('email') != -1) {
var name = null;
var matches = span.innerHTML.match('"(.*?)"');
if (matches != null && matches.length == 2) {
name = matches[1];
}
var address = span.innerHTML // split to multiple lines for readability
.replace(/.*<(.*)>\s*/, '$1') // remove '<' and '>'
.replace(/\s*\[at\]\s*/, '@') // replace ' [at] ' with '@'
.replace(/\s*\[dot\]\s*/, '.'); // replace ' [dot] ' with '.'
if (name == null || name == '') {
name = address;
}
if (span.style.className.match('no-link')) {
span.update(address); // render the address as plain text
} else {
// render the address as a mailto: link
span.innerHTML = '<a href="mailto:' + address + '">' + name + '</a>';
}
}
}
}
Then once the page loads, just call initEmailLinks() and all the address [at] domain [dot] com links change to address@domain.com. Since it’s sometimes useful to show an e-mail address without making it a link, adding the CSS class “no-link” to the span surrounding the e-mail address has this effect.
The code is much simpler when jQuery or Prototype are available, since we can use the libraries’ CSS selectors to get the elements we’re interested in. Since I’m switching things over to jQuery, I’ll post that version here:
$(function(){
$('span.email').each(function(){
var span = $(this);
var name = null;
var matches = span.html().match('"(.*?)"');
if (matches && matches.length == 2) {
name = matches[1];
}
var address = span.html() // split to multiple lines for readability
.replace(/.*<(.*)>\s*/, '$1') // remove '<' and '>'
.replace(/\s*\[at\]\s*/, '@') // replace ' [at] ' with '@'
.replace(/\s*\[dot\]\s*/, '.'); // replace ' [dot] ' with '.'
if (name == null || name == '') {
name = address;
}
if (span.hasClass('no-link')) {
span.html(address);
} else {
span.html('<a href="mailto:' + address + '">' + name + '</a>');
}
});
}
If you’re using jQuery and you don’t care about support for using custom link text (such as a person’s name) and rendering e-mail addresses as plain text, then the code can be simplified to this:
$(function(){
$('span.email').each(function(){
var span = $(this);
var address = span.html().replace(/\s*\[at\]\s*/, '@').replace(/\s*\[dot\]\s*/, '.');
span.html('<a href="mailto:' + address + '">' + address + '</a>');
});
}
For convenience, your server-side code should to be able to properly obfuscate e-mail addresses for you on-the-fly. I use Rails for most of my projects, so here’s a simplified version of a helper method I use:
def email(address, name = nil)
'<span class="email">' +
(name ? %{"#{name}" <} : '') +
address.gsub('@', ' [at] ').gsub('.', ' [dot] ') +
(name ? '>' : '') +
'</span>'
end
I highly recommend using something like this, especially when you’re putting a client’s e-mail address on their website. It’s fairly quick and easy to implement on both the server and client, and it will keep the address harvesting to a minimum while still being accessible to users without JavaScript.
Catching up
2009 is less than a month away, and I’m already (read: finally) writing my first blog post. I’ve written a blog engine (which runs ErikEbelingArt.com/blog) but never a simple post. Go figure.
As a web developer, I have the pleasure of solving some interesting problems, from simple CSS tricks to complex server-side image processing. Knowing the feeling of hitting a roadblock at 1:00am the night a client website is supposed to be launched and finding a solution on someone’s blog, I thought it was past time for me to contribute. Plus, I want to get published before the Internet is full.
My current development focus is on Rails, but I still do plenty of Java (which can be written as JAVA if you’re a recruiter). I do a fair amount of JavaScript using Prototype and Scriptaculous, and I’m getting into jQuery as of late. I also do a lot of HTML and CSS, striving for valid, semantically correct, accessible code. Although I have no formal web design or graphics training, I enjoy cutting up a design and converting it to a standards-compliant website. That is, until it comes time to “fix” it for IE.
I’ve got some topics I’ll be writing about soon, which will cover CSS, JavaScript, Rails, nginx, and more. I’ll also write about new problems and solutions as they come up.
If you’re wondering “why the strange Nick not found name”, my best answer is, “why not?” I’ve been thinking about registering it as a new HTTP status code, but I haven’t settled on a number yet (all the good ones are taken). Perhaps more importantly, I can’t find a site that lets me register a new one.