Skip to main content
Topic: Correction to copy/paste from Word.docX ducuments (Read 2480 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Correction to copy/paste from Word.docX ducuments

Yesterday I was repoted by a friend who likes to copy/paste his documents from MS Word (*.docx format), that the formatting is breaking.

When I looked into this, I found that Word produces quotation marks "" for each font-family, but the javascript only strips the marks on the beginning and end of the font-family list, resulting in a tag like this:

Code: [Select]
[font=Times New Roman","serif]
[font=Times New Roman","serif]
[font=Times New Roman","serif]

It's because of this quotation marks the whole formatting is breaking.

It is in this file: {ROOT}\themes\default\scripts\jquery.sceditor.bbcode.min.js
that you can correct it.

replace this:[/font]
[/font]
Code: [Select]
"[font=" + c.replace(/"/g, "") + "]" + b + "[/font]"
[font]
[font=Times New Roman","serif]

with this:[/font]
[/font]
Code: [Select]
"[font=" + this.stripQuotes(c) + "]" + b + "[/font]"
[font]
[font=Times New Roman","serif]

This works as it should.

Word.docX unfortunatly also produces a lot of extra bbcode on empty lines.
I tried to find a solution for it.

I came up with this:
[/font]
[/font]
Code: (extracted javascript) [Select]
font: {
            tags: {
                font: {
                    face: null
                }
            },
            styles: {
                "font-family": null
            },
            quoteType: a.sceditor.BBCodeParser.QuoteType.never,
            format: function(a, b) {
                var c;
if(b === null){
return;
}else{
return "font" === a[0].nodeName.toLowerCase() && (c = a.attr("face")) || (c = a.css("font-family")),
"[font=" + c.replace(/"/g, "") + "]" + b + "[/font]";
}
            },
            html: '<font face="{defaultattr}">{0}</font>'
        },
[font]
[font=Times New Roman","serif]

Though it works, you first need to save the whole post and open it again. Than the extra bbcode tags will be removed.

If you have a better idea to remove extra empty bbcode right away, please let me/us know.

I'm talking about the extra code like this:[/font]
[/font]
Code: [Select]
[font=Times New Roman,serif]Here is a text line[/font]
[font=Times New Roman,serif] [/font]
[font=Times New Roman,serif] [/font]
[font=Times New Roman,serif]Here comes my next line [/font]

P.S. I attached an unminified version with my changes

Kind regards,
Esteffano


Re: Correction to copy/paste from Word.docX ducuments

Reply #1
The part of code you have modified is the one that takes care of toggling between WYSIWYG and "source".
It should be possible to attach a function to an "onpaste" event, then it would be "enough" to capture the text pasted into the editor (either bbc or WYSIWYG or both), pass it through a function to clean it up, and return it to the editor all clean and dandy.
The slightly simpler alternative, may be to add your code, capture the paste event and then toggle between the editor mode (WYSIWYG and "source" and then back so that the code is executed. It may result in an unpleasant effect due to the mode change.
Bugs creator.
Features destroyer.
Template killer.

Re: Correction to copy/paste from Word.docX ducuments

Reply #2
Hello Emanuele,

the first thing is definitley a Bug (and should be corrected)

When I looked into this, I found that Word produces quotation marks "" for each font-family, but the javascript only strips the marks on the beginning and end of the font-family list, resulting in a tag like this:

Code: [Select]
[font=Times New Roman","serif]
[font=Times New Roman","serif]
[font=Times New Roman","serif]

It's because of this quotation marks the whole formatting is breaking.

Your hint for the onpaste functionality is good.

Kind regards,
Esteffano

Re: Correction to copy/paste from Word.docX ducuments

Reply #3
Actually, it should already be fixed, though it's not working because in preparsecode what arrives are htmlspecialchars'ed strings, and the regex (with double quotes) fails.

@Spuds, master of regular expressions, I came up with this code:
Code: [Select]
			$parts[$i] = preg_replace_callback('~\[font=([^\]]*)\](.*?(?:\[/font\]))~s', function ($matches) {
$fonts = explode(',', $matches[1]);
$font = htmlspecialchars(trim(htmlspecialchars_decode(array_shift($fonts)), '"'));
return '[font=' . $font . ']' . $matches[2];
}, $parts[$i]);
what do you think? Does it make sense? (Obviously "fixed" for php 5.2.)
Bugs creator.
Features destroyer.
Template killer.

Re: Correction to copy/paste from Word.docX ducuments

Reply #4
This is fixed already? How about the extraneous tags? If not, @Esteffano, do you want to submit a PR?

Re: Correction to copy/paste from Word.docX ducuments

Reply #5
Its amazing what copy and paste can do, or really the translation on to the clipboard and then the translation off the clipboard, all OS and app dependent.

@Esteffano thank you for the report and JS fix, we do need to improve that code, I think your fix looks good.  The extra empty lines etc are going to be difficult, it really should be caught by the editor core but there are soooo many cases.  I do think its the extraneous font and font-face stuff that are the biggest problem,  look at a html version of gmail, omg

The preparse code also needs improvement, its there to drop the extra font face tags (it only saves the first one if there are multiple), but it does not work for faces wrapped  in double quotes only for single quotes (and not if they are arriving in htmlspecialed) but it does use a positive look ahead to make sure it does not run away and eat the line.
Code: [Select]
'~\[font=\\\'?(.*?)\\\'?(?=\,[ \'\"A-Za-z]*\]).*?\](.*?(?:\[/font\]))~s'  => '[font=$1]$2'
  I believe we could change this to
Code: [Select]
'~\[font=(?:\\\'|"|")?(.*?)(?:\\\'|"|")?(?=\,[ \'\"A-Za-z]*\]).*?\](.*?(?:\[/font\]))~s' => '[font=$1]$2'
which should find 'Times' "Times" 'Times&quot ;  etc .... or what emanuele suggests and place it in its own mini function to try and catch all the cases.

ETA, the code block ate the &quot ;'s in the above .....
Be safe, Be kind, Happy Programing

Re: Correction to copy/paste from Word.docX ducuments

Reply #6
ETA, the code block ate the &quot ;'s in the above .....
Yeah, I still wonder why things are stripped from time to time... :-\
Bugs creator.
Features destroyer.
Template killer.

Re: Correction to copy/paste from Word.docX ducuments

Reply #7
When I looked into this, I found that Word produces quotation marks "" for each font-family, but the javascript only strips the marks on the beginning and end of the font-family list
Problems also happen if you simply copy and paste while in WYSIWYG mode and a quote is included in your post. See below:

This is a line of text. I will copy and paste it in the next line.

[font='Segoe UI', 'Helvetica Neue', 'Liberation Sans', 'Nimbus Sans L', Arial, sans-serif]This is a line of text. I will copy and paste it in the next line.[/font]

Re: Correction to copy/paste from Word.docX ducuments

Reply #8
Thanks ... forgot about this one  :( .   I added the improved handler to a pending 1.05 PR that I have.
Be safe, Be kind, Happy Programing

Re: Correction to copy/paste from Word.docX ducuments

Reply #9
Hello,

I'm glad my suggestion was of some use to the community.

I was so long not logged in here, as everything went well on the forum I helped to set up.

My friend with whom I set it up pointed me to the newest release and I wonder if I should upgrade?

What is this new feature/fix mentioned in the list:
"* restored the alphabet strip in the members list page"
Could you help me understand? :-)

Kind regards,
Esteffano