Handling multiple topic tags in Subject lines
Reported by Bill Cole | July 23rd, 2019 @ 02:58 PM
The existing support for "Subject->Blob" elements provides rudimentary support for what have more expressively called "Topic Tags" in Subject headers. These are used by some mailing list managers (notably GNU Mailman) to both mark messages as being "on list" and to differentiate between messages on different topics, which users can selectively subscribe to via their list preferences or use for organization on the client side.
As a result, some mailing lists emit messages with Subject headers containing multiple topic tags like this example from the Team Cymru Dragon News Bytes (DNB) list:
Subject: [DNB] [APT] [RESEARCH] Chances of destructive BlueKeep exploit rise with new explainer posted online
I have tried using the specifiers.plist file to extract multiple tags with no success. Ideally a specifier would extract an arbitrary number of tags, but there is no obvious syntax for that and even my attempt to extract up to 3 tags only succeeded in breaking "Blob" extraction.
Comments and changes to this ticket
-
benny July 26th, 2019 @ 12:26 PM
- State changed from new to accepted
Do you know if “topic tag” is used in an RFC or something other semi-official documentation? I do like it better than “blob” which was taken from an RFC as far as I can recall :)
The
specifiers.plist
file is very tricky to change, but I think the following replacement of the subject parser would make “blob” into a multi-value header, but I cannot really recall where MailMate might assume that it's a single value:{ parsers = { subject = { header = "subject"; // Note the non-ASCII colon (\uFF1A - a full width colon) which is used by some Chinese messages. specifierRegexFormatString = '(?x) ^\s* # Whitespace ((?:\s*${MmSubjectPrefixRegexp}[::])+)? # Prefix including things like "Re: Re[5]:". \s* # Whitespace ((?:\[([^\[\]]+)\]\s*)+)? # Subject blob, e.g., "[TxMt]" \s* # Whitespace ((?:\s*${MmSubjectPrefixRegexp}[::])+)? # Prefix including things like "Re: Re[5]:". \s* # Whitespace (.*)$ # Subject body '; specifierCaptures = { 1 = { specifier = "prefix"; }; 2 = { specifier = "topics"; parsers = ( "topic" ); canBeImplicit = :true; }; 3 = { specifier = "prefix"; }; // 4 = { specifier = "blob"; }; 4 = { specifier = "body"; type="noTabs"; parsers = ( "words" ); }; }; }; topic = { specifierRegex = '\[([^\[\]]+)\]'; specifierCaptures = { 1 = { specifier = "blob"; }; }; }; }; }
Let me know when and how it breaks and I'll see if I can eventually make this default behavior. And maybe use “topic” instead of “blob” as the specifier name.
-
benny July 27th, 2019 @ 07:11 AM
Quick update: Don't use the above as is. It doesn't work as well as I thought it would (leading to empty subject lines and even crashes).
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
Mac OS X email client.