After excessive debugging I found a few items that I want to put up for discussion.
In my opinion there are several problems in the logic for creating new topics from email.
1. Detecting and separating a mail message in headers and body
File / function:
EmailParse.class.php:327 / _split_headers()
Problem:
The function tries to detect beginning and end of the header section by using regular expressions. The problem is that this function is called recursively for parsing the sections of a multipart message. The sequence is:
- for the whole message
- for first section
- for second section
- ...
As a matter of fact, when called for a section, this will never include a header. Hence the existing code will always return without filling the body block, since this is done by the second regular expression that fills the reg exp result in the $match[] array.
Solution:
Just delete / comment the header start check:
// Do we even start with a header in this boundary section?
// if (!preg_match('~^[\w-]+:[ ].*?\r?\n~i', $this->raw_message))
// {
// return;
// }
2. Separating of the sections of a mail message
File / function:
EmailParse.class.php:751 / _boundary_split()
Problem:
The function tries to detect beginning and end of the header section by locating the delimiting '--<boundary>' pattern. The problem is that defined by the mail RFC1342 the multipart section is finally ended by adding two more hyphens at the end of the line, i.e. "--<boundary>--".
In the existing approach this last line causes creation of a section that has no content other than '--'.
Solution:
Modify to suppress the creation of a section not only for empty section, but also for closing section:
// Nothing or end of multipart split?
// if (empty($part))
if (empty($part) or (strcmp($part, '--') == 0))
{
continue;
}
3. Reformating the body part of a message
File / function:
EmailFormat.class.php:365 / _clean_up()
Problem:
The charset parameter is not processed correctly. Towards the end of the function a couple of special characters are replaced in a dedicated way. This must not be done if the text is UTF-8 encoded. Unfortunately this is checked only for the capital letter variant 'UTF-8'. In my case the character set came in as 'utf-8' which consequently destroyed the content of the body text.
Solution:
Modify the check to be case insensitive:
// And its 1252 variants
if (strcasecmp($charset, 'UTF-8') !== 0)
The last item was particularly tricky as I only stumbled accidentally over the use of some German "Umlaut" character which caused some mails to fail.
I have made this changes locally to my server. What is the correct way of bringing the suggested changes into future versions of ElkArte? On a lighter note, I feel a little perplexed that this has not been brought up so far. I wonder how the "add topic by email" feature could ever work correctly. Any opinion on this?