Importing HTML, MS Word as articles.

I have a number of articles on my disk, some as html, others as MS Word .doc files. I would like to import them as articles and have them automatically convert to articles so I have the Title, Author, etc. names. How can that be done?

I particularly would like to import the word documents and have the html for the article be simple <p> text </p> without formatting on each paragraph. I will do my formatting in CSS. Can that be done?

John Shaw
Friday, June 11, 2004

This is an issue for many folks. Here is a discussion from earlier this year:

Friday, June 11, 2004

What I do is past from Word to Notepad, then from Notepad to CityDesk and hope for an easier solution.

Friday, June 11, 2004

Importing HTML would require parsing of the HTML by CD ~ copy what you need from a browser and past it using (without formatting) is all we have to date. If you want to copy a HTML table etc. you'll need to copy the actual HTML code and paste it into the HTML view.
From DOC is a matter of copy-paste (without formatting).

The method I use is: With whatever content you want copied to the clip-board and an article open in normal view you can select 'Paste Without Formatting' from the Edit menu. This option 'should' be available from the right-click menu ... maybe in the next version.

You need to use the same (without formatting) if you're pasting copied text into any of the other article fields as well.

Setting 'without formatting' to default would be a nice feature :)

Perpetual Newbie II
Saturday, June 12, 2004

I get tons of content from Word, and then finagle Word to get clean HTML -- and still crap slips by. So I'm very open to suggestions and improvements!

I use MSFT's HTML Cleaner 2.0 (HTML Filter 2.0). One day I found out that I can launch this util on its own instead of within Word. That's the way to go! Cleans out most of the garbage tags. For once, thanks Microsoft!

A kind reader of these posts suggested HTML Tidy. Good to have this one in your toolset. Free as well -- incredibly.

To copy-paste to notepad stinks because I lose italics, bold, and <p> <br> tags, and then I'm just reconstructing the article instead of merely stripping away those dreaded MSO tags.

I welcome suggestions, I'm so weary of Word, but it's a fact of life...


Bob Bloom
Saturday, June 12, 2004

I can never get the HTML filter to do what I want it's either too much or too little. The Notepad thing does stink if the text has bullets / formatting I want to keep.

Saturday, June 12, 2004

Use OpenOffice to open the Word documents and convert them to HTML, much cleaner code!

Bas van der Weerd
Wednesday, July 14, 2004

