#195 accepted
Alan Harper

Search in Common Headers or Body doesn't work

Reported by Alan Harper | November 28th, 2011 @ 04:27 PM

Hi

I am still evaluating MailMate, and I am concerned because not only are searches often slow (a reason I am abandoning Apple Mail), but they also fail to find emails. I was looking for a specific email in my inboxes, and a search was not finding it. After poking around, I found that (a) the search was failing, and (b) searches in other email client programs succeeded in finding it.

If you look at attachment 1, you will see that a search for "Canon" + "Cartridges" in my "All Messages" mailbox finds emails only on 24 Nov, 7 June, 15 April and earlier, while if you look at attachment 2, you will see an email sent on 4 June whose body clearly has the words Cartridges and Canon in it.

I rely on searching actually working in my email client. Any idea why it is failing here?

TIA

Comments and changes to this ticket

  • benny

    benny November 28th, 2011 @ 09:43 PM

    • Assigned user set to “benny”
    • State changed from “new” to “accepted”

    If the problem is what I think it is then it is a known limitation. MailMate can currently only search so-called plain text body parts. Most of the time this is fine since any message with an HTML body part (the most often used alternative to plain text) should also contain a plain text body part with the plain text parts of the HTML body part. Unfortunately this is not always the case, in particular, when messages are generated by some web service. (You are actually the first to report this issue, so it is apparently not a frequent problem.)

    You can verify this by looking at “View ▸ Message Body Parts” for the problematic message. Look for a plain text body part and if there is one, see if it contains the words you are looking for. Let me know if this does not explain what you are seeing.

    Now, I do want to improve this, but this is for a different reason. Recently, Gmail allowed chat logs to be available via IMAP and these special messages only contain HTML body parts. And I would like to be able to search them. Consider that my personal motivation.

    With respect to speed, text search is currently quite brute force. Some day it'll be improved, but I cannot make any promises with regard to time frame. If you mostly search relatively recent messages then you may find this tip useful.

    I'll track any progress on HTML body part search in this ticket.

  • Alan Harper

    Alan Harper November 28th, 2011 @ 09:52 PM

    The message looks like:

    Delivered-To: xxx
    Received: by 10.147.136.13 with SMTP id o13cs68579yan;
            Sat, 4 Jun 2011 12:00:42 -0700 (PDT)
    Received: by 10.150.179.2 with SMTP id b2mr2880712ybf.410.1307214041816;
            Sat, 04 Jun 2011 12:00:41 -0700 (PDT)
    Return-Path: <order@meritline.com>
    Received: from APP2.meritline.com (app2.meritline.com [98.129.90.145])
            by mx.google.com with ESMTP id q3si1387246ybe.67.2011.06.04.12.00.41;
            Sat, 04 Jun 2011 12:00:41 -0700 (PDT)
    Received-SPF: pass (google.com: domain of order@meritline.com designates 98.129.90.145 as permitted sender) client-ip=98.129.90.145;
    Authentication-Results: mx.google.com; spf=pass (google.com: domain of order@meritline.com designates 98.129.90.145 as permitted sender) smtp.mail=order@meritline.com
    Received: from 291875-app2 ([127.0.0.1]) by APP2.meritline.com with Microsoft SMTPSVC(7.0.6002.18264);
         Sat, 4 Jun 2011 12:00:41 -0700
    MIME-Version: 1.0
    From: MERITLINE.COM <order@meritline.com>
    To: "xxx" <xxx>
    Date: 4 Jun 2011 12:00:41 -0700
    Subject: MERITLINE.COM Receipt
    Content-Type: text/html; charset=utf-8
    Content-Transfer-Encoding: base64
    Return-Path: order@meritline.com
    Message-ID: <291875-APP2ILBFZh4r000029a6@APP2.meritline.com>
    X-OriginalArrivalTime: 04 Jun 2011 19:00:41.0368 (UTC) FILETIME=[B30E3980:01CC22E9]
    
    PGh0bWw+DQogIDxoZWFkPg0KICAgIDxNRVRBIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIg
    Y29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PWlzby04ODU5LTEiPg0KICAgIDx0aXRsZT5N
    RVJJVExJTkUuQ09NIC0tLVJlY2VpcHQ8L3RpdGxlPg0KICA8L2hlYWQ+DQogIDxib2R5Pg0K
    ICAgIDxwIGFsaWduPSJjZW50ZXIiPjxiPjxmb250IHNpemU9IjMiPk1FUklUTElORS5DT03C
    oFJlY2VpcHQ8L2ZvbnQ+PGJyPjxmb250IHNpemU9IjEiPioqKiBQTEVBU0UgUFJJTlQgUkVD
    RUlQVCBPVVQgQU5EIFJFVEFJTiBJVCBGT1IgRlVUVVJFIFJFRkVSRU5DRSAqKio8L2ZvbnQ+
    PC9iPjwvcD4NCiAgICA8dGFibGUgYm9yZGVyPSIwIiBjZWxsc3BhY2luZz0iMiIgY2VsbHBh
    ZGRpbmc9IjAiPg0KICAgICAgPHRyPg0KICAgICAgICA8dGQgYWxpZ249ImxlZnQiIHdpZHRo
    PSIyMCUiPg0KICAgICAgICAgIDxkaXYgY2xhc3M9InJlcG9ydCI+DQogICAgICAgICAgICA8
    dGFibGUgY2VsbHBhZGRpbmc9IjAiIGNlbGxzcGFjaW5nPSIwIiB3aWR0aD0iMTAwJSI+DQog
    ICAgICAgICAgICAgIDx0cj4NCiAgICAgICAgICAgICAgICA8dGQgYWxpZ249ImxlZnQiIHZh
    bGlnbj0idG9wIj4NCiAgICAgICAgICAgICAgICAgIDx0YWJsZSBib3JkZXI9IjAiIGNlbGxz
    cGFjaW5nPSIyIiBjZWxscGFkZGluZz0iMCI+DQogICAgICAgICAgICAgICAgICAgIDx0cj4N
    CiAgICAgICAgICAgICAgICAgICAgICA8dGQgYWxpZ249ImxlZnQiIHdpZHRoPSIyMCUiPk9y
    ZGVyIE51bWJlcjo8L3RkPg0KICAgICAgICAgICAgICAgICAgICAgIDx0ZCBjb2xzcGFuPSIz
    IiB3aWR0aD0iODAlIiBhbGlnbj0ibGVmdCI+NTY4MTMxMjwvdGQ+DQogICAgICAgICAgICAg
    ...
    

    So I think that answers the question. Why they are sending it this way is unclear to me

  • benny

    benny November 28th, 2011 @ 10:03 PM

    Yes, that is what I thought. In this case there is only one body part:

    Content-Type: text/html; charset=utf-8
    

    Often such messages do contain a plain text body part which is then just a short note with an explanation or a link to a homepage with the same content as the HTML body part. Ironically, this makes it harder to know when it is necessary to parse HTML for text searching.

    (I cut out your email address in the message in order to avoid it being picked up by any email address harvesters.)

  • Alan Harper

    Alan Harper November 28th, 2011 @ 10:09 PM

    Thanks for removing the email. I would have thought that the harvesting came that company's error, not mine.

    However, I get so much spam I just don't try to hide any more.

    Cheers

  • benny

    benny May 25th, 2012 @ 02:18 PM

    • State changed from “accepted” to “fixcommitted”

    This is now implemented. It works like this:

    1. If only HTML is provided then it is converted and added to the database to be available for searching.
    2. If both plain text and HTML is available then the body parts are compared. If, heuristically, HTML has more content than plain text then HTML is handled like in 1.

    (It can currently be a bit slow since external scripts are used for the conversion to plain text. Most noticeable if rebuilding the database.)

  • benny

    benny November 9th, 2012 @ 12:16 PM

    • State changed from “fixcommitted” to “fixreleased”

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Mac OS X email client.

Shared Ticket Bins

People watching this ticket

Pages