Fog Creek Software
Discussion Board




Knowledge Base
Documentation
Terry's Tips
Darren's Tips

Pasting text from Word creates garbage characters

When I copy/paste text from MS Word into the article editor, everything seems fine in Normal view, including formatting. However, some characters, like “ and ‘ and -, become Wingdings-like garbage when published. For example: the phrase “this is text” in Word still is “this is text” in the editor’s normal view, but becomes “this is text” in the browser after publishing. I use Word 2002 and IE6.

When I use the command Edit > Paste without Formatting, then these characters show up fine in the browser, but of course, all precious Word formatting is lost.

Is there a way to get text from Word into CityDesk that combines keeping Word’s formatting and not getting these characters show up as garbage in the browser?

Thanks.

Paul Iliano
Sunday, January 13, 2002

As I feared, I get yet other garbage characters in my example above. What the browser shows is a nice empty square instead of the character”.

Paul Iliano
Sunday, January 13, 2002

This is a well known Word annoyance... Word doesn't understand that smart quotes just don't come out right on the web. The best way to solve this is to create a small word macro that cleans up, replacing all the curly quotes with straight ones and the em/en dashes with regular dashes. Here are step by step instructions to create such a macro, which I've tested with Word 2000 -- should be almost exactly the same in Word 2002, please let me know if this doesn't work for you.

Once you've done this a single keystroke will clean up the smart quotes in Word.

* Run word
* Tools >> Macro >> Macros
* In "Macro name", type "CleanQuotes" and press CREATE
* The Visual Basic editor will appear.
* Replace the CleanQuotes function you see with these two
functions:

Sub CleanQuotes()
   
    Dim X As Boolean
    X = Options.AutoFormatAsYouTypeReplaceQuotes
   
    Options.AutoFormatAsYouTypeReplaceQuotes = False
   
    With Selection.Find
        .ClearFormatting
        .Replacement.ClearFormatting
    End With
       
    CleanReplace ChrW(8220), """"
    CleanReplace ChrW(8221), """"
    CleanReplace ChrW(8211), "-"
    CleanReplace ChrW(8212), "-"
    CleanReplace ChrW(8216), "'"
    CleanReplace ChrW(8217), "'"
   
    Options.AutoFormatAsYouTypeReplaceQuotes = X

End Sub

Sub CleanReplace(s As String, t As String)

    With Selection.Find
        .Text = s
        .Replacement.Text = t
        .Execute Replace:=wdReplaceAll
    End With

End Sub

* Choose File>>Save Normal
* Alt+F4 to close the VB editor
* Go into the word document you want to clean up
* Select the text to clean OR leave nothing selected to clean the whole document
* Tools | Macro | Macros
* Click CleanQuotes and press run
* Now cut and paste to CityDesk.

If you need to do this often, you may want to assign the CleanQuotes macro to a key. I like Ctrl+Q:

* Tools >> Customize
* Go to the Commands tab
* Click Keyboard
* Click Macros on the left and CleanQuotes on the right
* In Press New Shortcut Key, type Ctrl+Q
* ASSIGN / CLOSE / CLOSE

Now Ctrl+Q at any time will clean the selection.

Joel Spolsky
Sunday, January 13, 2002

Thanks VERY much. I works fine in Word 2002 too. The only hiccup is that not selecting text does not clean up the whole document (as I don’t know any VB, I just copied the VB code). Any suggestion?

Also, is there a way to reverse or undo this action, i.e. to put the smart quotes and dashes back after copying text to CityDesk? I have essays not exclusively for use on the web via CityDesk, but also to be used as printed documents, and for this use smart quotes would be a good thing to have.

Paul Iliano
Monday, January 14, 2002

In Word 2000, there's a menu option

Format >> AutoFormat >> Options >> Check the smart quotes option >> OK

Or just don't save :)

Joel Spolsky
Monday, January 14, 2002

Or, just get rid of the Unicode character set  tag in the <head> of your code . . .

Web Guy
Monday, January 14, 2002

Joel: thanks very much -- I really should have thought of that myself!

Web Guy: I'm intrigued by what you suggest, but I'm afraid I can't follow. Can you explain what you mean? What is this "Unicode character set tag": where do I look for it (template? article?), how do I recognize it, and what part should I get rid of?
Thanks!

Paul Iliano
Monday, January 14, 2002

*  Recent Topics

*  Fog Creek Home