JunkMatcher Howto: Property Tests(new stuff is in red)
In the Test Inspector window there is a Description section that describes what each property test does.
It's right in the list of tests shown in the Analyzer window. You can get to it by using the search box in that window (just type "SpamBayes"), or you can choose the menu item "Inspect SpamBayes" from the View menu (keyboard shortcut opt-cmd-b). After opening its Test Inspector window, here is what you'll see - say hello to SpamBayes property test:
SpamBayes is a great Bayesian filter developed by another open-source project - the author merely took their intellectual fruit and integrated it into JunkMatcher. Please do send your appreciation/donation etc. their way! You can also read more about Bayesian filtering on Wikipedia.
The Test Inspector window, in addition to the usual statistics, also shows you how many ham and spam SpamBayes have been trained on. It is important to keep these two numbers as close as possible: if you keep feeding SpamBayes junk, soon it will think the world is filled with nothing but junk!
You can also set two cutoff values for SpamBayes here: the ham cutoff value and the spam cutoff value. SpamBayes detects junk mails by first computing their spam probabilities: these are the numbers between 0.0 and 1.0. If the spam probability of a message is below the ham cutoff, it is classified as ham; if the spam probability of a message is greater than the spam cutoff, it is considered spam. Everything in between is then classified as "unsure" - but currently JunkMatcher treats the unsure category as ham (this is pictorially illustrated in the screenshot above). Basically lowering the spam cutoff increases SpamBayes' sensitivity to spam, with the risk of getting more false positives.
Another important setting exposed here is the happy meal plan for SpamBayes (or the automatic balanced training). The idea is very simple: JunkMatcher should be able to find interesting things to train SpamBayes automatically since it also runs a lot of other tests (instead of requiring us to train SpamBayes manually). These "yummy" ham/spam include the messages that SpamBayes made mistakes on, or nearly made mistakes on. Even when the SpamBayes property test was not turned on, or it was turned on but was not run because a verdict was reached before it had a chance, as long as the balance between the number of trained ham/spam is kept, JunkMatcher should still feed SpamBayes with fresh messages it received.
So here you go: you can turn on the automatic training function, and set a "balance" ratio between the number of the trained ham/spam. JunkMatcher will try very hard to keep feeding SpamBayes with a balanced diet, but if SpamBayes keeps making mistakes (e.g., keeps missing spam), the actual ham vs. spam ratio could be a bit out of the designated value.
Finally, in case you want to make SpamBayes forget all of the previous training, just click on the "Reset SpamBayes Training Data" button.
Sometimes you probably want to train SpamBayes manually because compared to the automatic training the manual process can cram a lot of "spam experience" into SpamBayes fairly quickly, or because you just want SpamBayes to get familiar with certain types of emails. How do you proceed then?
Easy - in Mail.app, just select the emails you want to use for training, right-click (or control-click) to bring up the contextual menu, and then select either the item "Train as Ham" or "Train as Spam" to start training (remember, ham = good and spam = bad!).
Don't worry about accidentally training on the same emails again, as JunkMatcher remembers what messages have been used in training - it will just skip those.
After the last step you will be asked for confirmation:
As said in the dialog box - make sure you get the ham/spam difference right! After clicking on Yes button a progress window will pop up to tell you how many messages have been trained:
At the end of training, you will then be presented with a summary dialog:
In the Log window the messages that have been used for training will also be color coded (light blue in the Received Date column).
Important thing to note here again is the balance between the number of ham trained vs. the number of spam trained. You can read more tips about training here at SpamBayes FAQ site.
Currently there are two:
There are three important characteristics of these tests:
This is why you might want to run only pattern tests when you're doing a "Match All" in the Analyzer window of JunkMatcher.app.
Bring up the Test Inspector window on the property (by double-clicking on it in the Analyzer window). Enter a pattern that matches the email address you use for the account into the "Recipient Pattern" field:
For example, a pattern "
The same thing can be done for a pattern too - just enter the pattern in the Test Inspector window over a pattern instead of a property.
By blank emails, we're talking about emails that show nothing in the message body; sometimes even the subject, sender, date, and/or recipient is missing.
You can turn on property "Blank Rendering" in the Analyzer window to filter out the blank emails (or even make it a hard test).
As to why these emails were sent to you in the first place, I don't have a definitive answer to this one, but here comes my conspiracy theory:
Insights are welcome!
I notice spammers use my correct email address in the recipients line, but with an incorrect name - how do I catch that?
(thanks to Al Heynneman for asking this question)
This is a good question that resulted in a solution that's proven to
be fairly effective. The situation is like this: say you are Joe with
an email address
In JunkMatcher we can catch this by setting the right patterns in the property "Recipient(s) mismatch". Bring up the Test Inspector on the property, and you can access the "Recipient Patterns" window by clicking on the "Edit Recipient Address Patterns" button:
Any legit email must match at least one of these patterns, otherwise it'll be considered junk. Adding patterns here doesn't automatically enable this functionality though - you still need to turn on the property "Recipient(s) mismatch" in the Analyzer window. For safety reasons, if no pattern is added here and the property is activated, no email will be junked by this test.
So let's add patterns that'll match all the possible ways people will address you in the recipient line - thus effectively rule out the wrong ones. Here are the patterns for this example:
First pay attention to how we escape the
And the second pattern will match these addresses:
So, the first pattern matches the correct pairings of the name and the address, while the second only matches the address. Any incorrect pairing of name and address, therefore, is not allowed.
If you can't figure out what the `
You can open the Sites window directly from its toolbar button, or by
choosing the menu item "Open Sites" (key:
The rest should be pretty straightforward: you can add/remove a site from the list. You can also open the drawer for "Safe Sites": this is a list of patterns describing the sites that JunkMatcher should never regard as bad sites.
If you spot a site in the bad site table and you want JunkMatcher to never collect it again, you can select the site and click on "Mark Safe" button - this will remove the site from the bad site list, and add a pattern matching the site in the safe site list.
As of March 19, 2005, I'm using (in this order)