Fog Creek Software
Discussion Board




While we're on the subject of character sets

Where can I find information the number of bytes required to encode characters in the following character sets?  I can guess a lot of them, but I need a definitive answer.

ANSI_CHARSET
DEFAULT_CHARSET
SYMBOL_CHARSET
MAC_CHARSET
SHIFTJI_CHARSET
HANGEUL_CHARSET
HANGUL_CHARSET
JOHAB_CHARSET
GB2312_CHARSET
CHINESEBIG5_CHARSET
GREEK_CHARSET
TURKISH_CHARSET
VIETNAMESE_CHARSET
HEBREW_CHARSET
ARABIC_CHARSET
BALTIC_CHARSET
RUSSIAN_CHARSET
THAI_CHARSET
EASTEUROPE_CHARSET
OEM_CHARSET

t.i.a
Monday, October 13, 2003

Each character set is controlled by a different standards body, so you're going to have a hard time finding just one definitive source.

Some places to start would be:

For Windows and Macintosh character sets, the Nadine Kano book: ISBN 1-55615-840-8

For Asian character sets, the Ken Lunde book: ISBN 1-56592-224-7

List of IANA-assigned charset names: http://www.iana.org/assignments/character-sets

The Letter Database: http://www.edi.ee/letter/

RFC 1345: http://www.ietf.org/rfc/rfc1345.txt

If there is a definitive resource I'd love to know about it. I sure wish the world was all-Unicode already!

Nate Silva
Monday, October 13, 2003

*  Recent Topics

*  Fog Creek Home