Fog Creek Software
g
Discussion Board




Double-Byte Languages List

Does any have or can anyone provide a list of double-byte languages?

I know they include many Asian languages, but I'm looking for a specific list of languages.

Thank you very much for your help.

Mike
Wednesday, April 28, 2004

Not to be a jerk or anything, but it does depend on the encoding. Do you want to assume the latest encoding only?

Li-fan Chen
Wednesday, April 28, 2004

I'm not quite with you.

Doublle -byte is what Unicode uses. The languages that Unicode uses a double byte for can be represented in other OS's uaing ciode pages.

Or are you asiking us to list languages with more than 224 characters? Technically you would be talking about scripts since I don't know of any language that has more than 224 phonemes.

Stephen Jones
Wednesday, April 28, 2004

Perhaps you specifically mean "double-byte character set" ("DBCS") (or DBCS code pages) by "double-byte languages". If so:

"DBCS is used in Microsoft Windows systems that are distributed in most parts of Asia. It provides support for many different East Asian language alphabets, such as Chinese, Japanese, and Korean. DBCS uses the numbers 0 – 128 to represent the ASCII character set. Some numbers greater than 128 function as lead-byte characters, which are not really characters but simply indicators that the next value is a character from a non-Latin character set. In DBCS, ASCII characters are only 1 byte in length, whereas Japanese, Korean, and other East Asian characters are 2 bytes in length."

"Unicode is a character-encoding scheme that uses 2 bytes for every character. The International Standards Organization (ISO) defines a number in the range of 0 to 65,535 (216 – 1) for just about every character and symbol in every language (plus some empty spaces for future growth). Although both Unicode and DBCS have double-byte characters, the encoding schemes are completely different."

I believe that DBCS code pages are only used for Japanese, Korean, and Chinese language (and variations) character sets (at least in Windows).

More information on the Windows character set codepages is available at:
http://www.microsoft.com/globaldev/reference/wincp.mspx

Philip Dickerson
Wednesday, April 28, 2004

Is this any help?
http://www.alanwood.net/unicode/fontsbyrange.html

Otherwise browse from here:
http://www.unicode.org/

Ged Byrne
Wednesday, April 28, 2004

Unicode doesn't use two bytes -- it's not an encoding scheme. You need something like UTF-8, UTF-16, or UTF-32.

http://www.unicode.org/faq/basic_q.html#19

You might be thinking of UCS-2, which uses two bytes (exactly) for each Unicode code point in the Basic Multilingual Plane (up to U+FFFF). Unicode code points currently go up to U+10FFFF (so you need more than two bytes to represent all possible Unicode characters)

scruffie
Wednesday, April 28, 2004

Sorry for the confusion caused by my initial post, and thank you for everyone's input.

I did mean "double-byte character set" and I meant to understand better what languages I'm currently not supporting with my database (which has text fields as varchar and not nvarchar).

It appears that Chinese, Japanese, and Korean are the main languages. Thanks, Philip.

I appreciate everyone's input on this issue.

As a follow-up question for anyone using MS SQL Server, in order to allow people with "double-byte character set" languages to post to this database in their language, am I only required to change the field type from varchar to nvarchar?

Thanks again,
Mike

Mike
Wednesday, April 28, 2004

First, find a list of all world languages.

Then strike English from the list.

There you go.

Alyosha`
Wednesday, April 28, 2004

This is a pretty useful site.

http://www.microsoft.com/globaldev/DrIntl/default.mspx

Stephen Jones
Wednesday, April 28, 2004

*  Recent Topics

*  Fog Creek Home