News about JunkMatcher

April 18, 2006: Where is Ben and where is the new JunkMatcher for Intel Macs?

Folks - some of you are wondering if your emails have been sent to /dev/null (they're not let me assure you). The reason of this apparent inactivity since early this year is that (what else): I've been swamped by the project work @ CMU and I'm also planning to graduate later this year.

I'm still planning to update JunkMatcher after the dust settles a bit. The plugin core needs to be integrated right inside Mail.app this time (no more separate Python processes), and various other libraries/components (PyObjC, ELinks, SpamBayes etc) need to be updated as well. Stay tuned.

November 10, 2005: Python taking 100% of CPU time? Please read the Troubleshooting page

There is a bug caused by Python's limitation on using groups in regular expressions (an expression can only have up to 100 'groups'). It happens when you have a very long Whitelist, and the list can't be pruned in JunkMatcher.app.

While I'm working on a permanent solution, please read this page for a workaround.

August 29, 2005: 1.6.0+ users having problems: please read the Troubleshooting page

There was an installation bug introduced in 1.6.0. Upgrading users who are having problems are recommended to read the Troubleshooting page in the Howto website: the URL is http://junkmatcher.sf.net/Howto/Troubleshooting/.

August 24, 2005: JunkMatcher 1.6.1 Fixed an installation issue for 1.5.8- users

  1. Bug fixed: Fixed an installation issue for users upgrading from 1.5.8 or earlier. If everything has worked for you, you don't need to upgrade to this version.
  2. There are many new features introduced in 1.6.0; make sure you read about them here.

August 22, 2005: JunkMatcher 1.6.0 is Happy Meal time for SpamBayes

  1. IMPORTANT: Do not downgrade after upgrading to this version - internal database format has changed. All of your log/training data will be converted to the new format for you.
  2. Added SpamBayes' "Happy Meal Plan" (automatic balanced training): If turned on (default, in the Test Inspector) SpamBayes will try to feed itself with yummy spam/ham in a balanced way - this happens even when SpamBayes property test is not turned on to make sure SpamBayes will learn from the results of the other tests. Due to this addition the SpamBayes property test is now turned on by default.
  3. Added the ability to correct JunkMatcher directly from Mail.app (save a trip to JunkMatcher.app): Just do what you usually do - click on the "junk bag" icon in Mail.app's toolbar, choose "Mark as Junk Mail" item in Mail.app's contextual menu, or move a message into/out of a Junk mailbox etc. Your action will be automatically reflected in the Log window of JunkMatcher.app.
  4. Added UI to set ham/spam cutoff values for SpamBayes (in the Test Inspector): Each cutoff value is a positive number between 0.0 and 1.0 (exclusive), and ham cutoff < spam cutoff. Any message with probability > spam cutoff (computed by SpamBayes of course) will be considered spam, and any message with probability < ham cutoff will be considered "definite" ham. The rest are classified as "unsure", but currently "unsure" is synonymous with "ham". :-)
  5. Added a new preference setting "Filter the test table if Match All finds positive(s)" (default is off). If you turn it on, after you apply a "Match All" in the Analyzer window, the search field will be automatically filled with string "test:matched" to filter the test table so you only see the positive tests. Clearing out the search field will show you the complete list of tests again.
  6. JunkMatcher now remembers which messages you have used for training, and will skip them if you try to retrain on them.
  7. Log entries are now color-coded in the "Received Date" column to indicate their training status; they will be automatically updated when you finish training.
  8. When doing a "Match All" in JunkMatcher.app with SpamBayes property test turned on, the "Last Match All" field in the Test Inspector window now shows the probability computed for the message even if the message is *ham*.
  9. Added a menu item under View: Inspect SpamBayes (cmd-opt-B). It will open the Test Inspector window on SpamBayes property test directly.
  10. Improved the speed of training processes.
  11. Improved stability when configuring JunkMatcher while matching is underway in Mail.app.
  12. Behavior changed: If you already have "JunkMatcher" rule in Mail.app, the installation process will not try to alter its settings (e.g., whether you want to color matched messages, what color you want, and whether you want to move the messages to Junk folder etc.).
  13. Bug fixed: MD5 keys of incoming emails were not computed in a consistent way.
  14. Bug fixed: When JunkMatcher.app is running, doing a "Match All" immediately after training sometimes did not use the latest training result.
  15. Bug fixed: When log is automatically recycled while JunkMatcher.app is running, making corrections to the subsequent log entries in the same session would cause errors.
  16. Bug fixed: Sometimes saving tests in JunkMatcher.app while Mail.app is running would produce a bunch of "A test is AWOL" error messages in the console.
  17. Many other GUI enhancements and dead bugs.
  18. JunkMatcher Howto website is again updated with much more information.

August 10, 2005: JunkMatcher 1.5.9 speaks Bayesian too!

  1. WARNING: Upgrading to this version will recycle your log!
  2. JunkMatcher now speaks Bayesian too! it integrates SpamBayes, a great Bayesian spam filter developed by the folks at http://spambayes.sourceforge.net. By default this test is *not* turned on. Before activating it, you need to train it with about a couple hundred of spam and a roughly equal number of ham (good emails). Two new contextual menu items "Train as Ham" and "Train as Spam" are added in Mail.app for this purpose (also available under the Message menu in Mail.app). You can check how many ham/spam you have fed SpamBayes via the Test Inspector window of the property test.

    Please note that repeatedly training on the same set of emails would decrease SpamBayes' accuracy. Also, using a more recent batch of emails for training can help SpamBayes to more accurately target spam. You can find useful tips at http://spambayes.sourceforge.net/faq.html#using-spambayes.
  3. When you correct JunkMatcher in the Log window, SpamBayes will also learn from the corrections.
  4. Improved stability when Mail.app starts under heavy spam traffic.
  5. Better error handling when checking for pattern/applicatoin updates.
  6. Bug fixed: Choosing "Apply Rules" on emails within Mail.app will now update the log accordingly, but the test statistics won't be updated.
  7. Bug fixed: Sometimes refreshing the Sites window immediately after correcting JunkMatcher would freeze the window.
  8. Bug fixed: Sometimes junk mails are identified but not logged.

August 2, 2005: JunkMatcher 1.5.8 - more update fun

  1. Added the ability to check for newer versions of JunkMatcher via the menu item Check for Application Updates under the JunkMatcher menu. Yon can also instruct JunkMatcher to check for application updates every time it starts in the Preferences window.
  2. Re-matching emails by choosing "Apply Rules" from Mail.app will not update test statistics now.
  3. Bug fixed: Now you can really see the Pattern Deltas when you use the Check for Pattern Updates function.

July 27, 2005: JunkMatcher 1.5.7 now phones home!

But don't you worry. Read on...

  1. Added pattern import/export (via two new menu items under the File menu): the exported pattern package has a .jpp file extension. When exporting you can also choose to generate a "news" file (the default filename is "PatternNews") so you can publish your pattern package. Double-clicking on a jpp file also starts the importing process.

    Simple instructions on how to publish your pattern package can be found at http://junkmatcher.sf.net/Howto/Patterns/index.html#Publish.
  2. Added the ability to check for pattern updates: Choosing the menu item "Check for Pattern Updates" under the JunkMatcher menu will tell you if there's any new pattern update available. You can "subscribe" to different pattern updates by changing the URL in Preferences -> Updates tab. You can also instruct JunkMatcher to check for pattern updates every time it starts.
  3. Patterns updated (including an SPF reject pattern targeting the header view). New patterns will also be published at http://junkmatcher.sf.net/Home/PatternNews from time to time.
  4. When installing factory patterns/importing patterns, if an addition is detected but the object to be added already has a duplicate *user* version, the GUI won't allow you to accept the addition now (plus it will offer an explanation in the lower half of the Pattern Delta window).
  5. Improved phishing URL detection to reduce false positives.
  6. Now whenever you change some pattern, the Analyzer window will show up if it's not opened.
  7. Improved the stability of the first-run (installation) process.
  8. Bug fixed: previously the menu item "Save Tests" was disabled (grayed out) if the Analyzer window had not been opened.
  9. The JunkMatcher Howto website is updated to reflect the new features.

July 11, 2005: JunkMatcher 1.5.6 says hello to you!

  1. Added the ability to choose which change to accept when using Install Factory Version of Patterns: check the "Show me the details" box in the confirmation dialog.
  2. URL parsing now handles one more case of using Yahoo redirection tricks.
  3. Added the following new TLDs approved by ICANN: .xxx, .jobs, .travel, .cat, .post, .mobi. These are used in collecting bad sites.
  4. Added the ability to reset test statistics: this is accessible in the menu item Reset All Statistics under the JunkMatcher menu and the Test Inspector window (for resetting statistics of a specific property/pattern).
  5. Added the ability of keyboard navigation in the Test Inspector window.
  6. More robust prevention of double-checking the same email and duplicates in the log.
  7. Improved Address Book operations: rapidly changing data in Address Book won't stress out JM (in reloading email addresses).
  8. If certain targeted view of a pattern is not used anywhere in the tests, the corresponding statistics will now be removed automatically to avoid possible inconsistency.
  9. Patterns updated.
  10. Bug fixed: Error message about "selector not recognized" when Install Factory Version of Patterns is used.
  11. Bug fixed: Install Factory Version of Patterns might introduce "Test is AWOL" in console.log.
  12. Bug fixed: Occasionally the background JMServer.py process is not killed, which might result in 'Resource temporarily unavailable' error message showing up in console.log, and junk not filtered.
  13. Bug fixed: Previously choosing "Apply Rules" on the same message again would not re-check the message.
  14. Bug fixed: When choosing "Open in JunkMatcher" contextual menu item in Mail.app, if there are multiple versions of JunkMatcher on your Mac, it will work correctly now by launching the right version.
  15. Bug fixed: The following sequence of actions caused errors: (1) add a pattern (2) change that pattern (3) remove the pattern. If users saved the files after this, patterns/tests files would be corrupted.
  16. Bug fixed: Changing a pattern will now correctly reset all of its statistics in the files.
  17. The JunkMatcher Howto website (accessible under the Help menu) is updated to reflect the additions/changes made since 1.5.0 up to 1.5.6.
  18. For developers: project upgraded to Xcode 2.1. Do not use Xcode 2.0 now otherwise you'll get build errors!

May 31, 2005: JunkMatcher 1.5.5: Life-is-like-a-box-of-chocolates release

  1. 1.5.4 was short-lived, but it introduced several useful new features, please read them in the previous release notes.
  2. Changing the name/regular expression for a pattern/meta pattern will now automatically mark it as a "user" pattern/meta pattern.
  3. Bugs fixed: Race conditions could happen when multiple threads are trying to open the internal email database/ bad site database. The symptom: some junk mails could be left unchecked, and sometimes an error message "Resource temporarily unavailable" could occur in console.log.
  4. Bug fixed: When selecting an IMAP inbox, sometimes messages are not checked.
  5. Bug fixed: Sometimes after removing an instance of a pattern, JM incorrectly assumes some meta patterns are not in use. This in turn may corrupt the pattern file.
  6. Improved stability when test/pattern files become inconsistent.

May 30, 2005: JunkMatcher 1.5.4 is released!

  1. Added a new property test "Has a phishing URL": This property checks if an HTML-based message contains at least a phishing URL. A "phishing" URL is a link pointing to a place different from what it claims to. Optionally you can check whitelisted emails too so they will get filtered if they contain phishing URLs. This is a very useful property and should probably be turned on at all times.
  2. Added a menu option "Install Factory Version of Patterns" under the JunkMatcher menu: The installation will replace all of your "non-user" (managed) patterns and meta patterns, but it won't change the statistics, the targeting views and the various states (on/off etc) of the existing patterns. To keep your modified non-user patterns/meta patterns, you can turn their "user" mode on.

    If a duplicate is found between the user and the factory patterns/meta patterns, the user pattern/meta pattern overrides the factory one. Any pattern additions will be added to the end of your test list (possibly with multiple instances targeting multiple views).
  3. Added a "User" checkbox in the Meta Patterns drawer - this allows you to switch a meta pattern between user and non-user (managed) mode.
  4. Updated the factory versions of patterns/meta patterns.
  5. Added an automatic log refreshing option in Preferences : Other Settings. If the option is enabled (default), when the Log window becomes the main window, it will automatically refresh its contents. It is useful when you have JunkMatcher opened in the background and want to check the Log window from time to time.
  6. Added one more color in the Log window: messages "corrected to junk" get orange color, and messages "corrected to clean" get yellow color.
  7. Added a new menu option "Recycle Log" (command-K) under the Edit menu; active only when the Log window is in the front.
  8. More robust URL parsing.
  9. When installing the Mail plugin, if you're sure Mail.app is not running but JunkMatcher reports otherwise, you can now click on "Install Anyway" to proceed.
  10. IMPORTANT bug fixed (Panther and Tiger): leaking sockets. Should improve stability under heavy spam attacks.
  11. Bug fixed (Panther): for some users the same email would show up twice in the log.
  12. (For developers) Xcode projects fixed so they can be built on your machine. :-P

May 17, 2005: JunkMatcher 1.5.3 is released

  1. Added support to Mac OS X 10.4.1 (hopefully they won't change too much from now on).
  2. Bug fixed (Panther and Tiger): sometimes a junk message is checked but not moved.
  3. Bug fixed (Tiger): sometimes a "clean" email would show up twice in the log.
  4. Whitelist checking is now case-insensitive (so you don't need to add "(?i)" all over the place).
  5. Lower the minimal height of the Analyzer window to 480 pixels for screens of lower resolution.

May 13, 2005: 1.5.2 - Tiger/Panther version is here!

This version adds only Tiger compatibility - it should run (at least now) on both OS X 10.3.x (Panther) and 10.4.x (Tiger). No additional feature was added.

April 29, 2005: Tiger compatibility?

Several of you have asked me whether JunkMatcher will be updated to be Tiger-compatible. My answer is yes; but because currently I don't have a copy of Tiger myself, I've been bribing people with Tiger to do test for me. I'll release a compatible version as soon as I can!

March 25, 2005: JunkMatcher 1.5.1 is released

  1. Emails containing character '\0' (ASCII code 0) are considered "malformed" now, and will be thrown away without checking. There was an exploit a while ago using this in HTML (http://secunia.com/advisories/12064/).
  2. When adding/updating/testing a pattern, JunkMatcher will now complain if the pattern contains a newline character (you still can match a newline character using pattern "\n" - it's just that every pattern has to be on one line).
  3. Bug fixed: installing in a second account no longer generates a message about "not being able to install the plugin."
  4. Bug fixed: installing JunkMatcher on a system with a different version of PyObjC installed should work now.
  5. Workaround of an interesting Mail.app behavior: when writing an email over a longer period of time, the draft saved would be colored and sent to JunkMatcher to match.
  6. Minor cosmetic changes to the GUI.

March 19, 2005: JunkMatcher 1.5.0 sees the light of the day!

This is a total rewrite of JunkMatcher, with many new functionalities and a much better GUI. However, this version is not compatible with previous versions of JunkMatcher. If you're running JunkMatcher 1.19c or earlier, your old patterns will be moved to the Desktop folder after the first run of this version. From there you have to open the backup files with a browser, and copy/paste the patterns into the new JunkMatcher.app.

Following is a list of major improvements over the last version.

Overall Improvements

  1. Full Unicode support: now you can write patterns/meta patterns in almost all languages, such as Chinese/Japanese/Korean/Russian/etc.
  2. The "rule actions" part of the "JunkMatcher" rule actually functions (previously it was just "Stop evaluating rules"). The actions will be executed if a message is classified as junk by JunkMatcher. The action settings in the old version of JunkMatcher are therefore moved back inside Mail.app rule setting for better stability and more flexibility.
  3. Incoming emails are now matched immediately, rather than being matched only after all emails are downloaded.
  4. You can now load emails into JunkMatcher directly from Mail.app via a contextual menu item "Open in JunkMatcher" (also in Message menu).

Engine Improvements

  1. Properties and patterns are treated more uniformly: now both are called tests, and you can mix/reorder any test.
  2. Tests (properties and patterns) are more fine-grained: you can decide whether a particular test should apply when a message is sent to a particular address (via setting its recipient pattern), whether a pattern should apply when an email is in certain encoding (via setting its encoding pattern), or if an email is in HTML (turn on/off its HTML switch).
  3. A test can be marked as a "hard" (vs. soft) test: although you can tell JunkMatcher to classify a message as junk if a certain number of positive tests are found, a positive "hard" test will immediately make the email junk.
  4. A single pattern can target multiple "message views" (subject, sender, body...). Changing the master pattern instantly changes all instances of it.
  5. You can use regular expressions in the whitelist now.
  6. Properties using data from Address Book will automatically update themselves when you change Address Book.
  7. Detailed statistics of all tests are now logged - these include CPU time, precision and recall, etc. In a future version a different matching strategy will utilize the statistics.
  8. Patterns are marked as either "managed" or "user". Pattern subscription (will be implemented in a future version) will only change the managed patterns.

GUI Improvements

  1. The entire JunkMatcher app is now implemented in Cocoa and PyObjC: more user-friendly, more comprehensive, more reliable and much faster!
  2. You can now drag to reorder the tests.
  3. Test settings are now available via the Test Inspector window, including the property settings.
  4. You can define what HTML tags are considered good tags via GUI now (for property "too many bad HTML tags").
  5. Too many other improvements - see for yourself!

Installation Improvements

  1. The installer is now "merged" into the JunkMatcher app (a wizard-like process). The installation process is kicked in only at the first time a user runs the app, so the app itself won't be reinstalled numerous times by many different users, and no more file permission/ownership problem!
  2. Offering an opportunity to configure some initial settings during the installation process.
  3. It is now possible to reinstall Mail plugin/rules via menu items in the JunkMatcher app (see the the JunkMatcher menu). You can also let the JunkMatcher app check the integrity of the plugin/rules every time it starts up.
  4. User configurations are now stored in ~/Library/Application Support/JunkMatcher. The GUI-related settings are still stored in ~/Library/Preferences/edu.cmu.cs.benhdj.JunkMatcher.plist.
  5. The Mail plugin now contains only minimal code. The engine resides in the JunkMatcher app for the plugin to use.

The issues of the current version are:

  1. Pattern importing/exporting in version 1.19c has not been re-implemented.
  2. Spam reporting in version 1.19c has not been re-implemented.
  3. IP Query tool in version 1.19c has not been re-implemented.
  4. Pending: checking new version.
  5. Pending: optimized matching strategy.