I'm wondering what the relationship is between a character set and a character encoding and how they relate to fonts.  I always thought that a character set was the actual set of characters from a certain language and character encoding describes how a stream of text from a given character set is encoded into a stream of bytes to be sent over a network, read/written to/from a disk or otherwise interpreted.

Now a font comes in to play by providing glyphs for a given character set but it does not have anything to do directly with character encoding? 

Would someone please explain to me the relationship between these 3 terms?

Saturday, July 24, 2004

There are multiple posts covering these topics, I think you'll find that they cover most of the questions you have.

Sunday, July 25, 2004

Thanks Lou.  After reading that and Joel's article I now, finally, think I have a grasp on this stuff.

Sunday, July 25, 2004

So if I created an HTML page in English with a content type of UTF-8 what benefit would it have to a Korean viewing the page?  Isn't the benefit of an encoding only useful if the language it's written in needs that encoding?

Sunday, July 25, 2004

UTF-8 is a useful encoding because it's pretty much universal. For ASCII, it's identical. For extended characters, it's an easy to decode multibyte system.

HTML has its own weird encoding (& # number), which is almost but not quite unicode. Plus it has named entitities (& eacute), which are sort of nice but make it hard for, say, XML parsers (which only have 2-4 named entities without external references) later.

So if your page might ever have a word of Korean on it, may as well pick the almost-universal UTF-8. I believe the major problem area is the overlap with variants of Chinese characters, where you need to know what language (Simplified Chinese, Traditional Chinese, Japanese, Korean, or Vietnamese (the last two being rare)) is being used to pick the right font.

this looks informative:

Sunday, July 25, 2004

