Fog Creek Software
Discussion Board




Knowledge Base
Documentation
Terry's Tips
Darren's Tips

Singing the Mandatory Unicode Blues

Woke up this morning,
found out that all my umlauts had turned to mush.
I said: Woke up this morning,
and all my umlauts had turned to mush.

Talked to the Netadmin,
asked 'em what was going wrong,
he told me it was because of their Apache header,
which is why I'm singing this song.

Turns out they're sending out an ISO header,
which overrides the charset in the pages,
This turns all my pages inside out, mama,
and leaves me crying for the upcoming ages.

<family-unfriendly adlibs snipped>

I've got the Mandatory Unicode blues.

<guitar solo, interrupted by incontrollable sobbing>

Please, dear Joel S.: Make the pain go away.

geraldH
Friday, January 30, 2004

We can't actually fix this problem. 

If your sysadmin insists on sending out the ISO header then all your pages will not be able to include non-ISO-whatever characters.  Basically they are preventing you from publishing HTML pages in any language other than the one they specify.

You should ask them to stop that practice.

Michael H. Pryor
Friday, January 30, 2004

Wouldn't it be nice to be able to adjust the format that CityDesk uses?  IsoLatin 8859-1 is very commonplace.  It would be nice to adjust CityDesk to use that or UTF-8, etc.

David Burch
Friday, January 30, 2004

Given that the whole server contains data in German with some smattering of English, one code page for the whole domain isn't that bad of a thing.

The issue being: If I could set the charset for CD instead of having Unicode shoved down my throat, this would be a non-issue. Because I would happily publish my pages in ISO or Latin or whatever, I even like to use HTML entities (you know, &amp; etc.). But CD won't let me get by without Unicode.

So I'm not really fond of hearing "We can't actually fix this problem" under the given circumstances.

(The reason for the netadmins enforcing the code page, by the way, is because many of the maintainers of the site use umlauts in ISO form and since they fail to specify a charset in their file headers, browsers on other operating systems typically fail to display these high-ASCII characters correctly.In the end, I am being punished for other people's failure to declare their charset. And boy, do I feel sorry for myself. *grin*)

geraldH
Friday, January 30, 2004

Can't you just change the meta command in your templates to specify something other than utf-8?

joel goldstick
Friday, January 30, 2004

As I understand it the content-type meta header describes the character set the pages are in.  Citydesk outputs utf-8 changing the header doesn't change the character set, it  tells the browser that the wrong character set is in use.

Ken McKinney
Friday, January 30, 2004

That wasn't very clear..... Changing the meta header tells the browser  that a different character set is in use than is actualy the case.  This tends to make things worse not better.

Ken McKinney
Friday, January 30, 2004

Let me sum up the problem:

- The server sends charset info in HTTP header (within transmitted data packet). [In the case of my server, this is set to ISO Latin.]

- CityDesk 2 sets charset in HTML header (in page). [This is always Unicode-8, you can't change it.]

The browser receives both the HTTP packet and the HTML page. When it finds a charset definition in the packet header, it will use that one to display the page -- and here is the important part -- no matter what you define in the head of the HTML document itself.

HTTP Header overrides HTML. This is true for all browsers that I know of.

geraldH
Saturday, January 31, 2004

geraldH :re Unicode Blues

http://www.cl.cam.ac.uk/~mgk25/unicode.html

IMO the people who manage *your maintainers of the site* need to be educated. Perhaps if you provide them with the link above that might help get your issues resolved where it must be resolved -- on the server.

David Mozer
Saturday, January 31, 2004

As it has been explained to me by Fogcreek in a previous post, CD publishes using UTF-8, no matter what charset is specified in the head of the page. I see two problems:

1. Server configuration -- we've experimented and see no reason to specify a charset in the server set up. The pages handle this in the browser.

2. For a variety of reasons, some of our pages need to be in iso8859-1 to get characters to display properly. This means that we are sorely limited in our use of CD for complex PHP projects, which is a bummer.

It would be a Good Thing if CD3 would allow us to specify, on a site by site basis, what charset to use when publishing.

amos
Saturday, January 31, 2004

*  Recent Topics

*  Fog Creek Home