Complete treatment of character encodings

Is there a good text or article that talks about encodings, their history & drawbacks, and maybe how many common systems use them?  It always scares me to deal with characters, since I feel they're "messy" internally, like all my tools work hard to hide everything from me.  And it seems there can be a lot of information loss in conversions.


das gringo
Saturday, May 10, 2003

Developing International Software for Windows 95 and Windows NT : A Handbook for Software Design
by Nadine Kano

- we had this book at work, it had all the character tables that you could only imagine.

Developing International Software
Nadine Kano
November 2002  2nd Edition 

- is said to be an update. (i didn't see this book) is said to cover changes in internationalization that were introduced since windows2000.

Michael Moser
Saturday, May 10, 2003

Tim Bray has some recent articles that area worth checking out:

fool for python
Saturday, May 10, 2003

I find to be an excellent resource. The page on UTF encodings -- -- will answer a lot of questions about exactly how characters are encoded, how many bytes they take, etc. And the section called "Code Charts" is just plain fascinating.

Someday everything will be in Unicode.

In the meantime, if you have to deal with non-Unicode encodings of East Asian text (called CJKV - Chinese, Japanese, Korean, Vietnamese), by far the best book is "CJKV Information Processing" by Ken Lunde. It is extremely detailed, and gives you a small amount of history and cultural rationale for why those encodings were developed.

Nate Silva
Sunday, May 11, 2003

"Someday everything will be in Unicode"

Even PHP's buggy ODBC support? That will be the day ...

Just me (Sir to you)
Monday, May 12, 2003

From my favourites:
All you ever didn't want to know :)

Peter Ibbotson
Tuesday, May 13, 2003

Esp see "test pages" and
National Keyboards Layouts.

Friday, May 16, 2003

Unicode Demystified
By Richard Gillam

Wednesday, May 21, 2003

