Fog Creek Software
Discussion Board

unicode article inaccuracies

I just read the unicode article (, and I believe I have spotted an inaccuracy.

The article mentions that you should use UCS-2 as an encoding (or, at least, that's what Joel uses):

"For the latest version of CityDesk, the web site management software published by my company, we decided to do everything internally in UCS-2 (two byte) Unicode..."

However, UCS-2 cannot represent all Unicode code points.  UTF-16 is a much better choice (although it must use more than two bytes to represent some characters).

So, is this a bug? :)

I could be wrong, so I welcome any corrections to the above.  As I said, I'm no expert on Unicode.

Damien Fisher
Wednesday, December 10, 2003

Actually, the storage representation depends on what version of Windows you're running.

On the 9x versions of Windows, there's no native Unicode support, except for a very few functions. Generally speaking, if you use Unicode internally, it's UCS-2, as that's what the OS conversion functions are expecting.

On NT core, 4.0 and prior use UCS-2, and 2000 and later use UTF-16.

This means that, to be perfectly safe, you can't do ++/-- on wchar_t in C/C++ any more, if your code will be running on 2000, XP, or 2003.

Brad Wilson (
Wednesday, December 10, 2003

Oh, also, with 9x you can use the UNICOWS layer, which gives implementations of almost all the Unicode APIs. I believe that you're still talking about UCS-2 Unicode strings there.

Brad Wilson (
Wednesday, December 10, 2003

Thanks, and ewwww :).

I knew that Unicode support was variable across different versions, but the fact that the encoding changes from UCS-2 to UTF-16 in newer versions indicates to me that if you really want portability, you are best sticking to using a 3rd party library (e.g., ICU).  Otherwise, I can just imagine all sorts of weird bugs cropping up on older versions...

Damien Fisher
Wednesday, December 10, 2003

*  Recent Topics

*  Fog Creek Home