Skip to main content
Topic: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses (Read 2787 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

I got this bug report here. Not sure what to make of it.

Testing: em and en dashes (— and –) and ellipsis (…)

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #1

This is part of the sanitizeMSCutPaste function that is supposed to take care of few "odd" Microsoft encoded chars.

Now, the question that comes to my mind is: from what I can see, we are replacing apparently valid UTF8 codes to an ASCII equivalent.

I did a quick test with a bunch of codes that should result in an empty message removing the sanitizeMSCutPaste function and it looks like it works, but since this is @Spuds territory (charsets give me a headache! LOL), what do you think? It could be removed?
Last Edit: February 05, 2016, 02:46:09 am by emanuele
Bugs creator.
Features destroyer.
Template killer.

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #2

According to this table, U+2010 through U+2027 are printable characters, and all of those UTF-8 codes fall in that range. I don't know about Linux, though, so maybe Linux doesn't implement all of the CP1252 range Unicode in their fonts?

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #3

I think you could remove sanitizeMSCutPaste (in subs.php) since we only support UTF8.

The  problem was from cut and pasting between the character sets would lead to the display of the famous Circumflex Â.   I think it was only from word which used CP1252 and some forums were in ISO 8859-1 which did not understand.  Should test cp1252->iso 8859-1 ->utf8 to be sure

Ultimately it was MS using code spaces in a character set they should not have been using for display characters.

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #4

Which isn't a problem with my forum, since it's UTF-8 encoded, and all browsers should respect that. Technically, it looks like it is converting perfectly valid UTF-8 encoded characters to ASCII because it isn't meant to display non-entity Unicode characters? Seems a little dated if UTF-8 is the default encoding of the forum. I'll go ahead and remove it from my copy of the forum.

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #5

Yeah mostly a left over ...

Again you could enter those characters in MS Word CP1252, but that was a MS special CP1252 version as it was using areas outside of the allowed printable range.   When that was plopped in ISO 8859-1, which should have CP1252 characters, and then some, it would give the A with a hat, and I think it was still a problem when that was converted.

The original subset was chosen as those characters appeared through natural typing in Word.   When we converted over to UTF-8 only way back it should have been removed, I think I only removed half of it, there was a section looking for those bogus characters in ISO 8859-1 same for UTF-8 back to  ISO 8859-1 (all cut and paste stuff)

Anyway its no longer relevant, I just tried C&P'ed some CP1252 crud to the editor and it looks fine, so I think its safe to go forth and display your curly quotes.

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #6

x-ref: http://www.elkarte.net/community/index.php?topic=3451.0
Bugs creator.
Features destroyer.
Template killer.

Re: Forum, or at least WYSIWYG editor, seems to convert em/en dashes and ellipses

Reply #7

Did we ever track this one, I can't remember  :-[