Fog Creek Software
Discussion Board

Knowledge Base
Terry's Tips
Darren's Tips

UTF-8 again.

First off, I want to state unequivocally that I have no problem with CD2 producing XHTML Transitional.


I do see the forced UTF-8 encoding as intrusive. Netscape 4 has UTF issues and I have Netscape 4 visitors (my domain name attracts strange people). I don't plan to stick a gun into their face and tell them to update their browsers or else. I use Style Sheets in a manner which should keep pages legible without them.

In his knowledge base article, Joel suggests the usage of variables to keep © and &emdash; untouched. This may be fine for the occasional entity, but it's not fine for umlauts. German Chancellor Schr{$.ouml$}der looks kinda funky in Normal View.

Put succintly: It is a pain. And from everything I've read about Joel's design philosophies, using Fog Creek software shouldn't be a pain.

So: Feel free to force XHTML down my throat (jeez, I was going there anyway). But the double-byte characters are something I'm really choking on.

Quick solution: Somebody write a little tool which runs through CD's HTML output, adjusts the headers and changes all the UTF-8 characters back into HTML entities as Tim Berners-Lee originally intended.

This should be a simple job for anybody who can handle a compiler, which unfortunately I am not. (As a matter of fact, I just asked a programming whiz friend of mine and he said he wouldn't do it because it was too easy. One less name on my Christmas card list.)

Friday, August 29, 2003

I agree, this should be an easy task.
I'll see if I can't get a small utility done later today.

Lasse Vågsæther Karlsen
Sunday, August 31, 2003

If you'd do that, you will probably get your own fan club. :-)

Sunday, August 31, 2003

A beta-version of the utility has been uploaded to my website, the link to the file is:

Basically what it will do is run through the directory of CD-files before they are copied to the publishing site and replace "known" characters with their entity names instead. The list of known characters and their names is in a text file accompanying the program file so you can remove those you don't want to replace. NOTE! READ THE README FILE ENCLOSED IN THE ZIP FILE!!!

Also, note that you need the .NET 1.1 runtime to run this program, mostly because I do mostly .NET programming these days but also because .NET has built-in UTF8 and unicode support which alleviated the need to build these routines myself.

The direct link to the 1.1 .NET runtime is:

but you might be able to download it through Windows Update as well.

Since I'm at a location where I don't have network connection for my own computer I am unable to publish a updated site using this tool so please test this before you decide to run it on your live production site. The software has been tested with IE6 but nothing else. The software is beta, which means there are bugs in it. There will be an article about this tool on my website ( ) when I'm back at work tomorrow.

I would like feedback ( from:
- users of other browsers than IE6
- people that for some reason needed to change the entities.txt file
- people who experience problems
- people who have suggestions for fixes or additional features
- anyone with an opinion :)

Lasse Vågsæther Karlsen
Sunday, August 31, 2003

Note that the program does in fact not alter the header. I figured the output was just as UTF-8 compliant as the input, and if you decide to alter just some characters, the rest will still work fine from CD.

Lasse Vågsæther Karlsen
Sunday, August 31, 2003

Lasse: I will probably come across as the most thankless of all critters on this planet, but I am swamped with work today and tomorrow. I will try to test it on Wednesday.

My sincerest apologies.

Monday, September 1, 2003

Not a problem :)
I'm swamped too, so I wouldn't be able to do anything about any bugs or anything until wednesday anyway :)

Lasse Vågsæther Karlsen
Tuesday, September 2, 2003

Quick question. What UTF-8 issues does Netscape 4 have?

(I should mention that the 27 different language versions of Joel on Software have been in UTF-8 for a year now and I have not heard a single person mention a single thing about something not appearing correctly in their web browser.)

Joel Spolsky
Tuesday, September 2, 2003

Good point.

Please bare in mind that I didn't really need that tool myself, so I created it more as an excercise than anything else.

If anyone else needs it (or think they need it), then knock yourself out :)

Lasse Vågsæther Karlsen
Wednesday, September 3, 2003

*  Recent Topics

*  Fog Creek Home