EMAIL FILTER VALIDATION SUITE
Most email viruses employ similar propagation techniques and
are, consequently, very similar in external appearance to one another. That
being the case, it is possible to design an email virus filter that removes
viruses based on outward appearance. This is highly desirable, since it
will ensure that, even when a new virus comes along, your filter will be
able to screen for it.
On the other hand, the penalty for failing to detect a virus
in an email message is high. How do you know if you've covered all the
bases by detecting all of the important propagation techniques? This page
will allow you to download a email filter validation suite that consists of
generic email messages which can be passed through your filter to test all
of the popular methods of virus propagation. If your filter detects each
message and handles it correctly, it is probably ready for prime time.
Although not nearly as malicious as virus writers, spammers
are sometimes equally sneaky. Since many installations have systems in
place to detect and eliminate spam (such as BSM Development's
MailCorral),
this puts a crimp in the spammer's style. In order for spam to work, it
must be delivered to its target. Figuring out new ways to slip spam by
email filters is one of the things that spammers do periodically. This
validation suite also presents tests for some of the latest spam
techniques.
Messages Verified:
Here is a list of the messages included in the test suite and
the filter criteria they are intended to test. The name of each test is
given as well.
The messages beginning with "baddoc_" generally test the
limits of mail filters and/or their ability to handle misbehaved or
badly-formed messages.
- To begin with, a simple test message that is plain text. It doesn't
contain any MIME entities, although it is tagged with a "Content-Type" of
"text". Your filter should process this message as a plain text message
with no alterations (baddoc_text_plain).
- A message with a number of small attachments, all of them supposedly
innocuous. The filter should ignore them and pass them untouched
(baddoc_attachment_ok).
- A second message with several small attachments, that could be
harmful and are at least suspect. The filter should give a warning about
each of them but otherwise pass them untouched
(baddoc_attachment_warn).
- A third message with a number of small attachments, that are assumed
to be harmful. The filter should do something to render them harmless
(e.g. rename them so that they have extensions which are not
automatically opened) and give a severe warning about each of them
(baddoc_attachment_reject).
- A message with a single, large attached document of dubious nature.
Tests the ability of the filter to handle large attachments properly
(since milter spans messages in chunks) and to detect possibly harmful
attachments (baddoc_attachment_word).
- A message may contain a single attachment with no message text at
all. This happens when someone mails only the attachment with no
explanatory text. A virus filter must be able to handle this test case by
at least examining the attachment and rejecting malicious ones, whether it
actually generates a warning message or not. Note that, in our test case,
we also played a few games with the MIME description of the attachment (as
a virus might) to give your filter an extra test for thoroughness
(baddoc_attachment_only).
- Alternately, a message may consist of a single attachment that
contains message text. Some mailers treat such attachments as simple
messages, despite their being marked as attachments. For example,
"text/html" and "text/plain" entities that are given attachment names in
the "Content-Type" header of a message are often treated this way. A virus
filter should examine these kind of attachments and alter or reject the
malicious ones (baddoc_attachment_only_html).
- The attachments in this message contain Apple resource forks which are
sent, along with the attachment they describe, in pairs. Your filter
should deal with them as a single entity and not produce double warnings,
etc., about the resource forks themselves. Only the actual attachements
should be processed (baddoc_apple_double).
- All of the attachments in this message have no file type. If your
filter is based on file extensions only, it should warn you about them
unless it can examine the MIME type as well as the file type. If that
is the case, the harmless MIME types should be ignored and the others
should be generate a warning (baddoc_attachment_untyped).
- Some mailers may accidentally send MIME-encoded messages that have no
message body. A sample of one such message is included to test the
robustness of your filter (baddoc_body_empty).
- Mail messages that are forwarded from one user to the next often
include multiple levels of nested mail delivery information, mixed HTML
and plain text components. A test message that demonstrates such nesting
of delivery information and mixed content is provided to stress test a
filter's ability to handle multiple levels and apparently confusing
content (baddoc_forwards).
- Many email packages immediately interpret HTML in an effort to
properly render formatted messages. Good news for viruses, because this
indiscriminate rendering of HTML makes for a fine propagation tool. HTML
can appear as the only component of a MIME-encoded message, as in the
sample supplied, in which case it should be suitably laundered to remove
all harmful tags. Note that this example contains some tags broken in
typical ways used by virus writers to exercise your filter's HTML tag
parser (baddoc_html_straight).
- Another place where HTML can appear is in a multipart, MIME-encoded
message. A second message represents this case, which should be suitably
laundered to remove all harmful tags (baddoc_html_mime).
- Unfortunately, some email readers are prepared to interpret embedded
HTML, even when it occurs in a plain text message (i.e. not MIME-encoded).
This could have disasterous effects, if your filter does not look for HTML
tags in plain text messages. Another message presents a sample of this
kind of embedded HTML. Although it is less likely to be harmful, it
should still be processed (baddoc_html_embedded).
- Not all HTML in mail messages is particularly well-behaved. Much of
it is malformed. A test case is presented, containing mangled HTML that
should, none-the-less, be processed and rendered harmless
(baddoc_html_malformed).
- On the other hand, some HTML is harmless, although it appears not to
be because tags can be included inside comments. Various mailers such as
Word under Outlook use the technique of embedding formatting information
into comments in HTML messages. An email filter should be able to handle
this kind of embedded formatting so a test case is supplied
(baddoc_html_comments).
- HTML is often encoded to obscure its true nature from simple message
filters. This message contains poorly-behaved HTML (similar to
baddoc_html_malformed) that is also encoded. It should, be processed and
rendered harmless, and then reencoded, thereby testing a filter's
ability to modify encoded messages (baddoc_html_encoded_malformed).
- Although uuencoding has been largely surpassed by Base64 encoding,
messages using this encoding are occasionally received. This message has
an HTML-only, MIME-encoded body that contains HTML tags that should be
removed by a filter. It tests the ability of the filter to decode
uuencoded HTML, launder it, insert a message into it and then re-encode
it (baddoc_html_uuencoded).
- Outlook is a ubiquitous mailer that is quite capable of producing
some real hum-dingers when it comes to HTML encoded messages. If Word is
set as Outlook's text editor, much of the messages generated by Outlook in
HTML mode are word and HTML formatting tags. One such message is included
as a typical example of what is possible. A good virus filter should
reject little or none of this message (baddoc_html_outlook_typical).
- Finally, it is possible for HTML to be sent as an attachment to a
message. Such attached HTML cannot be altered, since we can only surmise
its purpose, but a warning should be issued about it. A test for this
case is included (baddoc_html_attachment).
- An email message which includes all of the acceptable MIME types as
inline entities (i.e. not attachments). Since some email programs might
interpret these types directly and not treat them as an attachment, they
must be filtered separately from the attachments. The filter should allow
all of these types through (baddoc_inline_mime_bad).
- An email message which includes a sampling of known unacceptable MIME
types as inline entities (i.e. not attachments). Since some email
programs might interpret these types directly and not treat them as an
attachment, they must be filtered separately from the attachments. There
are many MIME types and it isn't possible to know what they all are, in
advance. This being the case, the filter should only allow those MIME
types, that it specifically knows about, through. The ones in this
message should be rejected as unknown/bad (baddoc_inline_mime_ok).
- A message may contain a single inline MIME (i.e. not an attachment)
with no message text at all. This happens when someone mails only the
MIME entity with no explanatory text. Since some email programs might
interpret this type directly and not treat it as an attachment, it must be
filtered separately from attachments. A virus filter must be able to
handle this test case by at least examining the MIME type and rejecting
malicious ones, whether it actually generates a warning message or not
(baddoc_inline_mime_only).
- Some mail handlers may generate messages with MIME components that
don't conform to the standard. Regardless of this, many mail readers
will still handle them. In this case, a multipart/alternative section
appears to be misplaced because it is used solely for the purpose of
enclosing a virus scanner's certification. By the way, this message is
known to crash one version of the Perl module Email::StripMIME which is
frequently used in email filters, hence the reason for this test case
(baddoc_multipart_misplaced).
- As a test of whether your email filter can insert warnings into badly
formed messages, this example claims to be multipart/alternative but the
text portion of the alternative message is missing. Your filter should
still be able to insert a message into the HTML portion
(baddoc_multipart_missing).
- In order to detect spam/viruses in a message, an email filter may
need to decode encoded messages. This message contains an HTML component
that has been encoded to obscure it. The encoding has been done in
violation of the Base 64 rules, thereby producing an excessively long
encoded value, probably to cause filters to blow up and miss the spam
inside the message. The resultant message, while technically invalid is
handled by many mail readers and so must also be handled by your filter
(baddoc_html_overrun_encoded).
- This message contains an HTML component that has a single line that
is very long in order to test if an email filter can handle very long lines
without blowing up (baddoc_html_overrun_straight).
Mail archivers are often implemented as a mail handling robot
to which regular email messages are simply sent by an MTA. The MTA simply
duplicates every mail message it sees and forwards the copy, through regular
channels to the archiver. Should the archiver crash for any reason, the
messages sent to it may be bounced by the MTA. This often leads to the
bounces being sent back to the original sender, thereby causing much
confusion, since they never sent any message to the archiver.
To support mail archiving, your filter may wish to detect
bounce messages and trash those resulting from delivery failures to your
mail archive robot's address. The messages beginning with "bounce_" are
meant to test this ability in particular, as well as a filter's ability to
handle bounce messages in general.
- A bounce message from a CommuniGate MTA
(bounce_communigate_failure).
- A bounce message from Google's Gmail (bounce_gmail).
- A failure to deliver bounce message from Postfix
(bounce_postfix_failure).
- A message delayed message from Postfix (bounce_postfix_warning).
- A failure to deliver bounce message from Sendmail
(bounce_sendmail_failure).
- A failure to deliver bounce message from Sendmail, caused by the
message being relayed through too many hosts. (bounce_sendmail_hops).
- A message delayed message from Sendmail
(bounce_sendmail_warning).
- A piece of spam that is disguised as a bounce message. This is a
pretty common techinque used by spammers. The spam content is usually
in the bogus delivery report. (bounce_spam).
The messages beginning with "spam_" are designed to test the
spam classifier in your mail filter.
- A simple spam delivery check message is supplied with an obvious spam
sender name. When you are developing your filter, you can simply check
for this name to identify that the spam is delivered (or not) properly
(spam_delivery_check).
- Other than analyzing a message's headers, the most important portions
of a message to examine for spam are its plain text and HTML components.
Sample messages that include pretty typical spam are supplied in both
plain text and HTML forms (spam_text, spam_html).
- Spam may be sent in HTML format but as the text of a simple message. A
mail scanner should be able to handle this by looking for HTML in the text
part of the message. This test message (spam_text_html_paypal)
represents a typical phishing message from a well-known auction site that
should certainly be detected as spam. If your virus scanner is on the ball,
it may also detect the fake customer service (renewal) URL as a virus.
- As with virus writers, the spammers are constantly trying to
"improve" their delivery methods so that spam filters will not detect and
reject their messages. Encoding the message is one way of doing that.
Spam test cases in both encoded plain text and encoded HTML form are
provided (spam_text_encoded, spam_html_encoded).
- One powerful technique for spam filtering is the concept of the black
and whitelist, which involves comparing the sender's address against a list
of acceptable addresses. Spammers may try to bypass a scanner's list checks
by obscuring the address using quotes, comments, etc. This test case
(spam_text_addr_obscured) contains a spam, text message that will
probably pass through most spam scanners undetected. One would hope that the
black/whitelist catches the message but its "From" address has been obscured
to test the limits of your scanner's address parser.
- Some spammers may try to use long, encoded HTML to cause a mail
filter to give up or crash, thereby causing their message to be bypassed
and delivered to the recipient. This message simulates such a case,
where the HTML is spanned over many lines but, when decoded, turns into a
single, excessively long line (spam_html_long).
- The spammers are a bunch of guys who are really grasping at straws,
sometimes. A sample message is supplied that is multipart/alternative
with two components (text and HTML). Both are encoded Base64 to hide the
fact that they are spam. Anybody who encodes a text/plain segment is
obviously up to no good so it must be an act of desperation to make such
an obvious move (spam_mixed_encoded).
- With the advent of Bayesian classifiers that look for particular
words in messages to see if they contain spam, the spammers have taken to
obfuscating words by sending the message as HTML and inserting comments
into the middle of the words. HTML renders the text correctly but a
scanner may miss the key words. A good filter will remove such comments
as demonstrated by this test message (spam_html_masked_comment).
- Another technique used by spammers to mask words from Bayesian
classifiers is to represent the characters in the words as HTML escape
sequences. This message tests a filter's ability to convert HTML escapes
into regular characters so that the classifier can see the key words
(spam_html_masked_escape).
- Since some email readers will interpret HTML in the body of the
message (even though its supposed to be text), this spam message uses
HTML in the body and obscures the content from the Bayesian classifier
by inserting comments into key words. This tests a filter's ability to
remove obscuring information from HTML in the message body
(spam_html_obscured).
- HTML spam can appear as the only component of a MIME-encoded message,
as in the sample supplied. The spam filter should be able to handle this
case (spam_html_straight).
- In a couple of variations on a theme, text and HTML spam can appear
as the only component of a MIME-encoded message but also be encoded Base64
so that the spam filter won't see that it is spam. The spam filter should
be able to detect the spam in both cases supplied
(spam_text_straight_encoded, spam_html_straight_encoded).
- This message contains some rather obvious spam of the penile
enhancement variety that is found in the HTML portion of a
multipart/alternative message. It is the only component, incidentally,
bypassing the rule about multipart/alternative messages containing at
least a text portion (spam_html_viagra).
- Whether through omission or comission, quite a bit of spam arrives
with broken MIME components. The most common error is the omission of the
last MIME segment or segments terminator for all nested MIME levels. Most
mail readers blithly interpret this faux pas so a good spam filter must
accept it too (spam_mime_malformed).
- Another common error is the omission of any valid MIME segments in a
message marked as MIME. Many mail readers will simply interpret this kind
of broken message as text, showing the recipient the Spam. This being the
case a good spam filter must accept such broken messages as text too. In
our example, the message appears to be valid but, in reality, the MIME
separators are broken and the message has no MIME entities at all
(spam_mime_empty).
- Detection of spam is not always easy. The test suite includes a pair
of test cases that stretch the limits of traditional spam detectors. The
content marks them as spam but you never know (spam_text_notfilt1,
spam_html_notfilt2).
- If your mail filter employs a statistical check to find spam quickly
by counting the number of "bad" HTML tags in a message, this test should
exercise it with a multi-part message that contains many suspicious tags
in the HTML component (spam_stats_mixed).
The messages beginning with "virus_" are meant to test the
virus detection component of your mail filter.
- A very common email virus is one that includes a virus component as
an attachment and an automatic launcher, in the form of an HTML message,
that opens the virus. When your typical email package encounters it, it
interprets the HTML and launches the virus before the user even has a
chance to do anything about it. A sample of this kind of virus, using a
harmless virus component (in case it slips through), is provided
(virus_screensaver).
- Lately, we've been seeing a virus that looks like a mail system return
receipt for a message that was not delivered. I figure this is probably
some clever virus writer's attempt to sneak a virus by if the filters have
some special case checking in place for delivery-status messages and/or a
sender of postmaster. Either that or they are hoping to lull the user
into a false sense of security and convince them to open the message
because it is from a legit source. A sample virus that employs this
technique plus the usual HTML launcher is given
(virus_postmaster).
- This message is an example of a virus launcher in the HTML portion
of the message that is obscured by tricks with comments, etc. Perhaps not
all mail readers will be promiscuous enough to accept some of these tricks
but, if they do, it spells big trouble because the virus is launched when
the user opens the message (i.e. before they've had a chance to read it)
(virus_html_obscured).
- Some mail readers may accept multiple HTML MIME components and
assemble them into a single message when displaying them. If this is the
case, a possible virus exploit would be to split a virus launcher across
two HTML components. A sample of this kind of virus launcher is given
(virus_html_spanned).
- One method of delivering a virus launcher, that is applicable to
Outlook only, is to include HTML tags in the subject line of a message,
separated by a carriage return. Apparently, Outlook sees the carriage
return and starts interpreting the HTML in the subject as part of the
body. This works to the virus' advantage, since many virus filters don't
examine the subject for malicious HTML tags. Because it is perfectly
feasable to include a launcher in the subject that launches a viral
attachment, an example of this kind of virus is given
(virus_outlook_special).
- Since spam is becoming ubiquitous, I guess some clever virus writer
figured to take advantage of this fact. I recently received a piece of
pornographic spam that featured an executable attachment that purported to
show pornographic photos. Click on the attachment and, "You've got
disease!", to paraphrase AOL. You've got to admit, its a brilliant
scheme. Masquerade as an obnoxious message to hide the fact that you're
really a malicious one. Also, legitimate spam filtering must look for
viruses as well so this tests that case too. You'll love this example
(virus_as_spam).
- To test whether your email filter or the virus scanner that it
invokes can find an actual virus, this test message includes the sample
virus distributed with Clam AV as several attachments. Your scanner
should find and delete both of the attachments to this message, or at the
very least render them harmless. (virus_clam_attach).
- Another variation on the theme of the HTML virus launcher plus
attached viral payload is the encoded launcher. Although nearly all
"text/html" MIME type entities are sent as "7Bit" or some other unecoded
form, this is not a requirement. Encoding the launcher as "Base64"
so that the virus filter will not see it is a pretty typical trick. This
message supplies an example of this kind of encoded virus, using the HTML
version of the Clam AV sample virus. Your scanner should remove the link
and/or delete the message. (virus_clam_encoded).
- To test whether your email filter can detect an actual virus, the
HTML version of the Clam AV sample virus is embedded in the text of this
message as a clickable link. Your scanner should remove the link and/or
delete the message. (virus_clam_html).
How It Works:
We provide you with the suite of test messages and a shell
script that can be used to submit some or all of the messages to your mail
delivery program (e.g. the "mail" command).
Once you have your filter working, send the messages from the
validation suite through your mail system and observe the results. Each
message includes text that describes the test it is meant to perform and the
results that should be expected. Anything that doesn't conform to the
expectd results should be investigated.
You can also use the validation suite as a regression test
after filter code changes have been made.
Specially crafted email: the validation suite uses
hand-made email messages that contain generic varieties of the virus and spam
delivery techniques most commonly used. The messages are designed to exercise
all paths through a typical email filter program.
Safe to use: denatured viruses that cannot hurt
your system, even if they get through, are used in the validation suite.
They are much safer than using real, live viruses (but, if you'd like to
live dangerously, go ahead -- you certainly can use live viruses to test
your filter, just don't complain when you find out that we told you so).