Search This Blog

Wednesday, June 28, 2006

gmail and spamcop vs bluebottle

I have a gmail address which I exclusively use for mailing list subscriptions, news letters etc., But since the mailing lists that I subscribe to are archived and are searchable by google this gmail address became a haven for spammers who harvest email adresses from these mailing list archives. It reached such a state that I was getting around 100 spam emails everyday.

Normally, gmail does a good job of filtering these spam emails and puts them nicely in a separate folder/label called spam. But there are couple of problems
  1. Once in a while there are couple of emails which are spam but gmail's spam filters are not smart enough to filter them out.

  2. There are some false-positives. That is email which is good, but gmail's spam filters thought they were spam and kept them in the spam folder.
The first issue is easy to solve - Simply train the spam filter by reporting it as spam. The second issue is the real show stopper. I have to wade through hundreds of spam emails to figure out if there are any false positives in there. Adding an email address to the 'contacts list' will ensure that an email from that sender will not end up in the spam folder at a later point of time. However it does not eliminate their occurrence in the future from some other email address. This whole process of checking the spam folder for false positives, adding the email IDs to contacts list is not only time consuming but also frustrating due to its very own mundane nature.

I googled around and found spamcop ( http://www.spamcop.net/ ). After registering, users can report their spam emails to spamcop. Based on a complicated algorithm spamcop then lists the IP addresses responsible for spamming. This list is called SCBL (spam cop blackhole list). If you are interested in knowing this "complicated algorithm" or if you want to know more about how spamcop works, you can read the documentation available on the spamcop's website.

The idea is that once you have a fairly recent SCBL list, an email server administrator can use that to identify spam. When the spam is identified, the administrator can then either reject or tag the spam email. But since I am using gmail, the SCBL list is of no use to me. However, I started reporting spam to spamcop's website using the web interface hoping that it might help others using the SCBL. Since reporting each spam email takes roughly 0.5 minutes, I was spending an hour a day just to report spam. The time would have been well spent, if the spam actually decreased. But even after trying the whole thing for 2 months, spam has only increased and has not decreased even by a single bit.

Does that mean that spamcop does not work? No. Spamcop does work - well, to some extent atleast. To understand what I mean, it is necessary to understand how spammers operate.

Spammers usually do not own their own machines per se. They hack into insecurely configured machines on the net and start sending spam from them. Now if the ISP hosting these machines is a "good ISP" then they care about spam coming from their network. One such example is http://www.uk.easynet.net/ . Once I received some spam emails from a machine whose IP address is 217.204.66.154 . This IP address belongs to easynet Ltd. I reported those spam emails to spamcop which in turn forwarded the "spam reports" to the abuse team at uk.easynet.net . They immediately recognized the problem and took necessary action so that no spam email emanates from that machine. This is just one of the success stories of spamcop in action. There could be many more.

But what if the ISP is not a "good ISP"? If the ISPs do not care about spam coming from their networks then the spamcop method does not work for gmail users. The gmail users will still be receiving spam from machines owned by these "bad ISPs". So "spamcop + gmail" approach has some loopholes through which spam will manage to flow into the gmail's account.

Googling again, I came up with bluebottle ( http://www.bluebottle.com ) which is essentially a Challenge-Response system. The idea is that, if you send an email to the bluebottle's address you will get a challenge asking you to verify yourself. Only if the sender verification process succeds, I receive the email in my Inbox. Pretty cool Uh! Initially one would think so. But the C-R system has its own drawbacks. These drawbacks can easily be found by googling. But the upside is that one does not see any spam whatsoever in their Inboxes. So I need not wade through hundreds of spam emails looking for false-positives.

In bluebottle, emails from unverified senders go into a 'pending' folder. They stay there for a week, after which they will be deleted automatically. The messages in the 'pending' folder will go to the Inbox either if the sender verifies them or if I manually approve the message. The manual approval process eliminates the notorious 'C-R C-R deadlock'.

To quote some numbers, my gmail account typically receives 100 spam emails a day where as I receive 25 ham emails per day. So the ratio of spam/ham is 4. Yahoo's spam/ham ratio is even worse. Compared to this, in the pending folder of my bluebottle's account, I receive about 25 emails per day (waiting to be approved) of which around 5 turn out to be spam. The ratio of spam/ham is 5/20 = 0.25 . Coupled with 'no false positives', this makes bluebottle very attractive.

There are many drawbacks with bluebottle as well!
  • While gmail offers more than 2GB, yahoo offers 1 GB, bluebottle's free account offers only 0.25 GB.
  • Only Inbox folder can be POPped in bluebottle. In gmail, emails under any label can be POPped. Better yet, gmail offers IMAP facility which is much more useful.
  • There are ads attached to every email sent through bluebottle's SMTP. This is very annoying.
  • People sending the email do not usually answer challenges. So most of the messages have to be approved manually.
  • The bluebottle's servers (both POP and SMTP) are not as reliable/fast as gmail's servers.
  • There is no facility to search the CC fields of emails in bluebottle. Bluebottle's search facility is limited to the "To:" header and does not extend to the "Cc:" header.
One area where I had doubts about bluebottle's approach is the mailing list subscriptions. Will I be able to subscribe to mailing lists like debian-user through the bluebottle's email address? Will there be any problems? So far the results are somewhat it mixed. It works with some email lists (ex:- debian-user, debian-devel, debian-qa etc.,) and does not work with some other mailing lists (ex:- vim, texmacs-users, subversion-users etc.,).

Whitelisting all the emails from debian-user mailing list to your bluebottle's account is as simple as adding the following email addresses to your "Allowed Senders" list.
debian-user-request@lists.debian.org
However adding vim@vim.org to the "Allowed Senders" list does not whitelist all the emails from vim mailing list. The unverified senders who send email to vim mailing list still receive challenges.

So far, in terms of spam inconvenience, false positives etc., I can say that using bluebottle's free email account has yielded better results than using gmail+spamcop or gmail alone. But I feel that bluebottle's or any other C-R system is a short term solution to spam and does not scale well for large number of users. Spamcop's approach though interesting, needs some modifications for it to be effective. For now, I am adopting both the approaches - use bluebottle's address for mailing lists, report spam from gmail to spamcop as and when time permits. Only time will tell which one would succeed!

Comments, criticisms, typos, suggestions etc., can be sent to kamaraju at gmail dot com.

Last updated : Apr 10, 2008.

Wednesday, June 14, 2006

installing packages from marillat's repository

Marillat's repository contains useful debian packages like mplayer, acrobat reader etc., To install let's say libdvdcss2 package from this repository follow these instructions.

1) Go to the end of http://debian.video.free.fr/

2) Copy the appropriate repository address. For example if you are running Debian Sid (unstable) on an i386 machine then you need to copy this line

deb http://www.debian-multimedia.org sid main

3) paste this line into /etc/apt/sources.list . Note that you need to have root permissions to edit the sources.list file

4) update the sources by doing

wajig update

5) Now install the necessary package(s) by running

wajig install libdvdcss2

This method is much more easier compared to downloading the package and installing it by hand using 'dpkg -i'.

Comments/corrections/criticisms/typos etc., can be directed to kamaraju at gmail dot com

Followers