#1465 ✓fixreleased
Stefan Doehla

SpamSieve score calculation for existing mails

Reported by Stefan Doehla | April 22nd, 2016 @ 09:00 AM

Hi Benny,

not sure I use MailMate and SpamSieve together correctly, but out of my 400000 mails in MailMate I have only a small subset that I can use for training SpamSieve (as I was too lazy doing manual filtering in the last couple of years). Of course the new ones catched by MailMate get SpamSieve scores, but 99.9% has no score associated.

Now could one run SpamSieve for the Bayesian filtering on a complete mailbox so that the scores are generated, then sort by score, and then train SpamSieve using this data? Not sure I'd like to filter already, I rather prefer a score first, and then clean up my old mailboxes manually ...

So far I've only seen training settings (the Junk/NotJunk) and scores for new mails coming in, but no way to evaluate old mails. Did I miss something?

Any hints?
- Stefan

Comments and changes to this ticket

  • benny

    benny April 23rd, 2016 @ 06:02 PM

    • State changed from “new” to “accepted”

    No, you didn't miss anything. The only workaround is to refetch messages, e.g., by changing the SpamSieve mailbox in the Security preferences pane and then apply “Message ▸ Reset...” to the emails in the mailbox. That would trigger refetching and MailMate would send the messages to SpamSieve. But it would also move emails to the Junk mailbox.

    I'm not sure how to best add what you are asking for. Maybe the “Junk State” menu should be extended with:

    -------
    SpamSieve > Evaluate
                Recompute Score
    

    The first doing the same as for incoming messages and the second one only recomputing the score without changing junk state or moving the message(s).

  • Stefan Doehla

    Stefan Doehla April 25th, 2016 @ 07:22 AM

    That sounds like a great solution and would perfectly fit my needs (both actions, actually :) )!

    Running 'Evaluate' or 'Recompute Score' on selected messages only makes perfect sense, I'd only like to ask for a 'separator' in the context menu to indicate that Junk/Not Junk and the extra operations are different beasts.

  • benny

    benny April 25th, 2016 @ 01:51 PM

    • State changed from “accepted” to “fixcommitted”

    Hold down ⌥ when clicking “Check Now” in the Software Update preferences pane (r5241). Let me know if it does not work as expected.

  • Stefan Doehla

    Stefan Doehla April 26th, 2016 @ 04:32 PM

    Hi Benny, I've checked the 'Recompute Score', and it did what I wanted, but I might have been a bit too optimistic when marking ~4000 Mails, so SpamSieve was busy for a minute, but MailMate needed ~30 minutes to recover to become responsive again. Is there any post-SpamSieve action involved now that could stall MailMate or was this still the SpamSieve 'Recomputation'?

  • benny

    benny April 26th, 2016 @ 06:11 PM

    Each email is sent (in full) to SpamSieve using AppleScript. This is going to be slow for a large number of messages. The only possible improvement is to do it in a separate thread, but this won't be high on my list for now.

  • Stefan Doehla

    Stefan Doehla April 27th, 2016 @ 01:06 PM

    Another reason for MailMate stalling seems to be when a few thousand messages are selected (or when I go back to a folder where all messages are selected). I can live with that and the feature above works fine. One minor thing though, there is no SpamSieve score anymore for messages that are marked as Junk/NotJunk, but this field can not be shown, so there's simply an empty SpamSieve score. Still, I tested both new context menu options and they behave well.

    Thanks a lot - now MailMate can be used as a fine-granular tool to train SpamSieve and I can also clean my messy mailboxes from all that never treated Spam :)

  • benny

    benny April 28th, 2016 @ 08:10 AM

    I didn't understand the “minor thing”? Does MailMate delete the SpamSieve score when you mark something Junk or Not Junk?

  • Stefan Doehla

    Stefan Doehla April 28th, 2016 @ 08:14 AM

    As soon as I mark a mail as Junk or NotJunk the SpamSieve score disappears in the column. I guess because it's then a clear '100%' decision, but this is a bit of a mystery to me.

  • Stefan Doehla

    Stefan Doehla April 28th, 2016 @ 08:17 AM

    Or let's phrase it differently: The score is a probability, where I could understand if one argues that as soon as it's used for training, it's definite and a probability cannot be attributed. But personally I think the score could be helpful even for training mails.

    I then thought that maybe there's a Junk State field (for mails marked as Junk/NotJunk) that is toggled instead when the score goes away, but I can't find a column header for this.

  • benny

    benny April 28th, 2016 @ 10:06 AM

    Ah, I had forgotten about that. Yes, it's cleared because it's essentially overridden by the user.

    You can map tags to the $Junk, $NotJunk IMAP keywords and then enable the Tags column. A more “raw” approach is to enable the “Raw Flags” column (which I ought to rename IMAP Keywords).

  • Stefan Doehla

    Stefan Doehla April 29th, 2016 @ 08:07 AM

    Hm, the $Junk keyword seems to work as a tag with an Emoji, but the $NotJunk for some reason not (though visible as a raw keyword) ...

    I actually reran the 'Recompute Score' on my Junk mailbox and found out that many mails get a rather low SpamSieve score when re-evaluated (not 100% as I thought). So I tend to think the SpamSieve score should remain to be able to correct SpamSieve.

  • Stefan Doehla

    Stefan Doehla May 4th, 2016 @ 07:24 AM

    Hi Benny, after some further thinking and looking at my mailboxes, I believe that the SpamSieve score should remain also for mails that have been user-marked, especially since the raw flags contain the Junk/JotJunk marking.

    The $NotJunk as emoji seems to work now, but I can't find out why it didn't before.

    Btw., is there any distinction between Junk/NotJunk triggered by the user or by SpamSieve?

    Thanks again for your prompt addition of the aforementioned feature - still very helpful! :)

  • benny

    benny September 15th, 2016 @ 03:57 PM

    • State changed from “fixcommitted” to “fixreleased”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Mac OS X email client.

Shared Ticket Bins

People watching this ticket

Pages