#722 new
Meindert

Duplicate message removal

Reported by Meindert | April 20th, 2014 @ 06:47 AM

In the course of importing all the old e-mail archives that I can find into MailMate I've ended up with a considerable number of duplicate e-mails (I think maybe 20,000 or more).

I found an extension for Thunderbird that seems to be well reviewed for batch processing of large numbers of duplicates:

https://addons.mozilla.org/en-US/thunderbird/addon/remove-duplicate...

http://blog.gnu-designs.com/removing-thousands-of-duplicate-email-m...

I wonder if this guy's code might be adaptable to MailMate, either as a plug-in, a freestanding utility, a built-in feature, or simply a model to use in developing something new?

Comments and changes to this ticket

  • Meindert

    Meindert April 20th, 2014 @ 06:51 AM

    Also, what are the best practices for importing e-mails into MailMate where they are not connected with a currently active mail account? I tried to create a dummy account with no working online IMAP account, but it was inactive when I tried dragging mail into it.

  • benny

    benny April 21st, 2014 @ 12:47 PM

    • State changed from “new” to “accepted”

    MailMate has a primitive “Edit ▸ Select Duplicates” menu item. It selects any messages that are duplicates of other messages (leaving the oldest occurrence unselected). That might help a bit.

    I do have some notes about how handling of duplicates could be improved. It is likely to be partly handled by external commands (i.e., bundle command(s)). I'll track any progress on that using this ticket.

    I do not recommend having messages in MailMate locally only. (To some extent, as you suggest, it can be hacked using an inactive dummy IMAP account -- there are caveats.)

    I do recommend uploading the messages to an IMAP account. This is done simply by importing them into a mailbox of an existing account. It'll also make it much easier to migrate to another email client in the future (which, of course, I hope you'll never need to do).

  • Meindert

    Meindert April 21st, 2014 @ 03:15 PM

    The Select Duplicates command got rid of about 10 percent of the 42,000 messages. From a spot-check, some of the remaining "duplicates" have different addresses, so they were probably cc'd to two of my addresses (hence, they are not strictly speaking duplicates).

    I originally used File > Import Messages to bring in the messages to a real account that had no really mail yet (a new account on a new domain). I then created a dummy IMAP account and mailbox in MailMate using an e-mail address that is dead and no longer works, and dragged all the imported messages into that, which is offline.

    The weird thing is, when I selected duplicates in the dummy account's mailbox and deleted them, the Activity Viewer showed expunging activity not in the dummy mailbox, but in the first account that they were imported into, the real account. What was going on here? As far as I could tell, the messages were no longer in that account. In fact, I think I did the whole operation of importing and dragging when both accounts were offline. I guess with IMAP, it remembers everything, and then tries to duplicate all the moves online the next time it's online?

    I'm slowing beginning to get my head around IMAP, but it certainly has its own logic.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Mac OS X email client.

Shared Ticket Bins

People watching this ticket

Pages