Antispam with DCC, qmail, and gnus

Intro

Spam traffic up significantly in the last few months and I get a lot of it, even at home. I'm on dozens of mailing lists so spammers can harvest my addresses. A melt-down in the traditional MAPS-like services and IMHO general loss of effectiveness prompted me to look at newer methods for decreasing spam.

There are two techniques I've heard of which sounded much more promising than the rest: Tagged Message Delivery Agent (TMDA) and Distributed Checksum Clearinghouse (DCC). DCC looked like it would be easier to get going quickly, so that's what I tried.

DCC

For sendmail, DCC has a "milter" interface; for qmail and other uses, it has "dccproc" which takes standard in, and sends output to standard out with an added header line indicating the score DCC assigns to the message.

When a message is piped through dccproc, it computes a handful of checksums on various parts of the message (envelop from, header from, subject, message ID, body, and a "fuzzy" body). It sends these to a DCC server which updates its counts for each of the reported checksums. (For privacy, DCC only sends the sums, never any of your message content to the servers). It then returns these counts to the client which formats a header line for inclusion in the message. For counts above a certain threshold, it returns "many".

Some of these counts are not good indicators of spam. Lots of mail goes to "postmaster", and lots of mail comes from popular mailing lists, for example. So I've settled on the fuzzy body match, since many spammers now sightly modify their messages to evade detection.

I grabbed the entire tarball instead of just the clients, but for what's discussed here, the client distro would have been sufficient. Install and configure DCC per the instructions. Configuration is in /var/dcc by default. I left the dcc_conf file alone, copied the master "whitecommon" whitelist from DCC's website, and created and configured the memory mapped file with "cdcc" ("new map") and added the two public servers mentioned on DCC's web ("add <servername>").

I then created my own client whitelist file in ~/.dcc/whiteclnt, based on the samples from the web, putting my network's addresses in as "ok" so local mail doesn't get reported to the DCC servers. I tested by running a spam mail through it manually like:

cat /tmp/spam | dccproc -w ~/.dcc/whiteclnt
and looked for the new X-DCC- header with counts. Very cool.

qmail

At home and work, I have stopped using sendmail, preferring qmail for speed, robustness, and security (please, no religious wars :-). I've been quite happy with it, but it's setup, config, and modus operandi is totally different than folks accustomed to sendmail would recognize. I wanted to quickly integrate DCC with qmail to test it out; if it seemed effective, I'd do a more serious implementation later.

qmail is unusual in that users can manage delivery to their own "extension addresses". So I can give out different addresses -- perhaps one for each email list so qmail will file them into different folder for me. For example, I might use chris@shenton.org, chris-gnus@shenton.org, and chris-freebsd@shenton.org. Local delivery to each of these extension address variants is controlled by a set of "dot-qmail" files file which can cause mail for the address to be delivered to a file, forwarded to another address, or piped to a program; each extension address has its own .qmail-<extensionname> file. I used a combination of these to quickly get DCC to add its score headers to my incoming mail.

People send mail to me at chris@shenton.org, so I use no-extension "~/.qmail" file to run the message through DCC with this content:

|preline /usr/local/bin/dccproc -w /home/chris/.dcc/whiteclnt | /var/qmail/bin/qmail-inject chris-dcc
It pipes the message and tells DCC to use the specified "whitelist" which allows various local and other friendly mail though, but also can negatively score mail which is known bad; for more info see the DCC site. DCC adds its score header to the message headers, then that output is piped to the "qmail-inject" program which queues it for delivery to me. Now here is the hack^H^H^H^H trick, one you couldn't do so trivially with sendmail. Instead of delivering it to my same address -- and causing a mail loop! -- I deliver it to an extension address "chris-dcc" so it can be handled differently.

Since all I want to do at this point is to add the header -- not make any delivery/rejection decisions -- I use the file ".qmail-dcc" to instruct qmail to deliver mail for "chris-dcc" to qmail's "Maildir" message format file (I could deliver it to a standard mbox format file, but prefer Maildir for its robustness). The contents of "~/.qmail-dcc" are simply:

./Maildir/
So when my mail user agent picks up mail from my Maildir, it will get the message with the DCC score header in it, e.g.:
From: b23097@go.ru
To: pzqtb@forum.dk
Subject: Save thousands with this software product
Date: Mon, 14 Jan 2002 15:36:33 -0500
X-DCC-rhyolite-Metrics: thanatos.shenton.org 101; env_From=1 From=1
	Subject=many Message-ID=1 Received=1 Body=many Fuz1=many
and my MUA, Gnus, can do with it whatever it wants to.

gnus

I use Gnus under GNU Emacs. It's a USENET news reader, it's a mail reader (supporting multiple mail "groups"), it's a dessert topping, it's a floor wax. It's absurdly powerful and configurable. For now, I don't want to reject mail with bad DCC scores. Instead, I want to show the DCC header, just for my own comfort. I add the "X-DCC-" header to the gnus-visible headers with a config in "~/.gnus.el" like:

(setq gnus-visible-headers "^From:\\|^Newsgroups:\\|^Subject:\\|^Date:\\|^Followup-To:\\|^Reply-To:\\|^Organization:\\|^Summary:\\|^Keywords:\\|^To:\\|^[BGF]?Cc:\\|^Posted-To:\\|^Mail-Copies-To:\\|^Apparently-To:\\|^Gnus-Warning:\\|^Resent-From:\\|^X-Sent:\\|^X-DCC-")

That seems to work well enough, so now I want to have gnus file it in a folder when it sees this header, with a fuzzy match on the body having a high score. To make it simple, I'll just look for a score of "many". In "~/.gnus.el" I do this with:

(setq nnmail-split-methods
      '(
	("in.dcc-body-many"	"^X-DCC-.*Fuz1=many")
	; [other filing rules elided]
)) 
Note that Gnus conveniently give me the unfolded X-DCC header, so I don't have to try and grab the two physical lines. This does the trick. All mail with a DCC fuz1=many goes into a separate folder so I can quickly scan it and delete or catch-up. Here's a snippet of the Gnus *Group* buffer showing some of my groups:
           4: nnml:in.chris
           1: nnml:in.root
*          0: nnml:in.misc
           4: nnml:in.dcc-body-many
and then the *Summary* of the new group for DCC-marked spam:
O  [ 208: Marco237@ladymail.cz] H e l p    P a r e n t s
R  [  43: yugo@ancestry.com   ] BOOST YOUR CELL PHONE RECEPTION-ONLY 4.95!!!
!  [  83: b23097@go.ru        ] Save thousands with this software product
O  [  19: wbwfwzrqxnkmrfsg@hot] Copy and make your own DVD movies
Seems to be working fine. Not all spam gets detected; if others don't send score updates to the DCC servers, there's no way for it to keep track. This is a resource that gets better with more users.

Update: simpler DCC/qmail integration

The re-injection's always bothered me as have delivery failures if something's wrong with DCC (e.g., I forgot to install it :-). The following mechanism is more fail-safe and seems cleaner, with less reliance on the MUA.

Chris Hardie's page points the way: use qmail's condredirect to redirect mail to a separate location based on dccproc's decision. For this, we don't need dccproc headers, just it's exitcode based on some threshhold of our choice. Based on the return value of the command condredirect runs, it will send to a different address. Condredirect and dccproc have different ideas on what exit codes to use so you have to invert the exit status of dccproc. We don't need dccproc's output so prevent it cluttering qmail's delivery logs. I use it like:

| /var/qmail/bin/condredirect chris-dccspam /bin/sh -c '! /usr/local/bin/dccproc -c CMN,10 -o /dev/null -w /home/chris/.dcc/whiteclnt'

./Maildir/

Above, if dccproc notices common counts exceeding 10, it exits with EX_NOUSER (76), which is logically inverted to exit 0, so dccproc redirects the mail to user chris-dccspam and it never gets to ./Maildir/.. Otherwise it goes into the usual Maildir -- and does this even if dccproc is missing.

You will have to create a ~/.qmail-dccspam to tell qmail what to do about such spammy mails. Mine looks like:

./Maildir-dccspam/

so it delivers into a separate Maildir. I can scan this looking for false positives to add to my ~/.dcc/whiteclnt list or have my MUA slurp it into a separate folder or anything else. Gnus can be configured to auto-expire mail after a certain time specific to each folder so this might be a nice lazy option.

I can tell Gnus to read from this maildir by putting something like the following in my .gnus.el:

(setq mail-sources '(; other sources go here
                     (maildir) ; default ~/Maildir/
	             (maildir :path "/home/chris/Maildir-dccspam/")))
Then I can use a method suggested by Andre Srinivasan so Gnus puts it in a separate folder. Here's part of my Gnus splitting definition:
(setq nnmail-split-methods 'nnmail-split-fancy)	;fancy instead of manual
(setq nnmail-split-fancy
      '(|
	("X-Gnus-Mail-Source"	"maildir:/home/chris/Maildir-dccspam/new"	"in.dccspam")

Later

Later I may want Gnus to throw away mail scored badly by DCC. It's probably better to do it before it gets to Gnus, instead of the quick-and-dirty hack I use here.

It's probably even better to have the .qmail files pipe the message to a local processor like procmail or maildrop, and let them deliver to the Maildir instead of re-injecting. It's probably best if qmail does this during its queue processing; there are patches for qmail to allow it to do spam and virus filtering like Qmail Scanner; there are many other leads on the www.qmail.org under the link "Microsoft Virus Prevention and Spam Prevention".

DCC can mark messages with a very bad score, "many". If you were to set up an alias for an address you create as a spam trap, then all mail to this address -- known spam -- could be piped through

dccproc -t many
This would send the checksums for this spam to the DCC servers so others would be able to identify this mail as spam.

DCC is more effective the more people use it. I should install it at a couple ISPs I support.

I should also run a local server instead of beating on the public ones.


Chris Shenton
$Id: dcc-qmail-gnus.html,v 1.6 2003/07/11 13:19:30 chris Exp $