Fog Creek Software
Discussion Board




Knowledge Base
Documentation
Terry's Tips
Darren's Tips

CityDesk and clean HTML

Running www.joelonsoftware.com through W3 validator points out a few HTML bugs that don't seem to cause problems in the browser:

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.joelonsoftware.com%2F&charset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=

I looked at this, because the HTML in the default welcome article could be a lot cleaner.  Some of this seems to be what was brought in, but some of the mistakes seem to be the result of the macro transformations.

Is it a design goal to produce clean, valid HTML?

Tim Randolph
Saturday, November 17, 2001

It's kind of complicated. There's a long answer and a short answer.

The short answer is that for this version, the goal is to produce HTML that displays correctly in all browsers. For the next version, the goal will be to get valid (X)HTML, something which is significantly more difficult so we haven't devoted resources to it yet.

Here's the long answer. CityDesk doesn't actually generate HTML. We use Microsoft's editor component (DHTMLed). It has its weaknesses. One of them is that it will occasionally do silly things, like fail to remove an empty <a></a> pair when you delete text that is a link, or using <strong> instead of <b>. So far, none of these weaknesses actually results in something which looks wrong in any browser that I know of. This is the price we pay for using off-the-shelf components instead of writing our own, but the benefit is significant -- writing a WYSIWYG HTML editor is out of the range of capabilities of a small company -- that would be a whole product in and of itself.

One of the alleged benefits of DHTMLed is that it is "source preserving." Or, it tries, at least. Many of the glitches in Joel on Software are actually remnants from the various editors I've used on it over the years (many articles started out in FrontPage or Manila). Since most of CityDesk is just a glorified string manipulator, it doesn't really have any awareness of whether every <a> has a matching </a>.

In the long run, here is what we want to do. Instead of trusting DHTMLed to give us the HTML source, we will actually walk to DOM ourselves and generate clean, valid XHTML ourselves from the DOM. (We may also switch from DHTMLed to the more modern MSHTML which is less buggy, but requires IE 5.1). This is a bunch of work but not impossible. It has the disadvantage of eliminating source preservation. It has the advantage of making CityDesk output web-standards-compliant. It's a sort of scary step because once we start trying to understand the HTML and treating it as HTML instead of as an opaque string, we better be sure to get it right or we may make it impossible to generate the web page you wanted to generate. For example, web standards generally believe that whitespace between tags is meaningless. Web designers have known for years that you have to jamb your </tr>'s and </td>'s together as closely as possible or your whole table gets extra space in it in a totally random way. So if CityDesk starts actually generating HTML ourselves, we become responsible for getting all these nitty details right.

Joel Spolsky
Saturday, November 17, 2001

What a cogent lesson in some of the trade-offs that go into making version 1.0 of anything that will actually ship in a timely way. Thanks for the explanation. 

Tim Randolph
Monday, November 19, 2001

One more thought...

HTML Tidy has a BSD style license and might be able to make up for a  lot of the flaws of HTML output that CityDesk uses.

It would be nice to have a CityDesk wizard that launched Tidy with in a few of its more useful modes.  Maybe in 1.1?

Tim Randolph
Monday, November 19, 2001

Good idea! Even with 1.0 you can run HTML tidy by setting it up as a script and launching it from a customized "web browser" button in the publish dialog (I use that a lot for post-processing).

Joel Spolsky
Tuesday, November 20, 2001

I had the same problem with DHTMLEd in our own application. By using Tidy I'm able to generate valid XHTML- code. So converting HTML pages to XML documents became a lot easier.

Tidy also proved to be very usefull for importing Word97/2000 documents. 

Ricardo
Thursday, December 06, 2001

*  Recent Topics

*  Fog Creek Home