Fog Creek Software
Discussion Board




Japanese characters

I had installed the CJK language pack on my machine. I use a C program to print out single byte and double byte Japanese characters to a text file. However, when I open the file in notepad, its all junk. I even set the font to MS PMincho and Japanese script. But that didn't solve the problem. Any idea of how to view the Japanese characters?

John
Wednesday, March 26, 2003

My guess would be that Notepad doesn't handle Unicode, and you'll have to find a different text editor.  Among others, any text editor or IDE designed for Java programming should work (jEdit or Eclipse, for instance) since Java supports Unicode.

Kyralessa
Wednesday, March 26, 2003

Since you said single byte and multi byte japanese characters I feel you are using Windows codepage characters rather than unicode. And Be carefull while mixing single/double byte in a stream as this will cause problems if a single byte misplaced.

R K
Wednesday, March 26, 2003

John what OS are you using? Windows 2000 Pro's notepad will display Japanese fine. I use it to develop Japanese content on my English OS.

Matthew Lock
Thursday, March 27, 2003

When you say "single or double byte," what coding system are you using? For Japanese the two most common systems are Shift-JIS and Unicode. AFAIK only the Japanese version of Windows will display Shift-JIS correctly in any context, whereas all Windows should display Unicode correctly. ("in any context", i.e. outside of Internet Explorer, which lets you choose any coding system on any version of Windows).

CJK is a big hassle because lots of CJK Windows software is designed for a specific international version of Windows- they just spew Shift-JIS or Big-5 all over the place and expect it to be displayed. If you run one of these programs on English Windows you'll just see crap like "@.-3!)#$#". More modern software that uses Unicode should work correctly on all versions of Windows (as well as other platforms).

Dan Maas
Thursday, March 27, 2003

We used notepad to create Unicode INI files so it definitely can do the job (after W2K).  In addition to the advice you have already received, make sure that the font selected for display in notepad is one that supports the characters.  Arial MS Unicode perhaps.

The SaveAs dialog gives you the ability specify that you are Encoding as Unicode when you save.

Ran
Thursday, March 27, 2003

Am using Win2K. I used MultiByteToWideChar and codepage 932 to convert to wide character and used printf ("%S", wszJpnChar). Am I doing the right thing?

John
Thursday, March 27, 2003

John I think In your file you have set BOM at the beginning to tell the editor that the content is unicode. BOM is byte order mark(If I'm not wrong) which is 2 bytes with value 0xFF 0xFE OR 0xFE 0xFF depending whether you are using big endian or little endian 16 bit unicode.
In your case set first 2 byte as 0xFE 0xFF. I hope this helps

R K
Friday, March 28, 2003

I see John, I think what is happening is you are producing a file in UTF-16 or Shift-JIS. It might be better to stick with UTF-8.

e.g. if you want to write the Japanese character for "sun" (the "Ni" of "Nihon") I would do it like this:

// the UTF-8 encoding of U+65E5 is E6 97 A5
putchar(0xe6); putchar(0x97); putchar(0xa5);

There are plenty of code libraries out there for converting between character sets, if your input is not in Unicode already.

For new software I highly recommend using UTF-8 as your standard input and output format. Almost all Windows software should recognize and display UTF-8 correctly. If you write the characters in some other encoding, like UTF-16 or Shift-JIS, it may work sometimes but not as much software will recognize it. (Windows and Java use UTF-16 internally, but UTF-8 is more standard for file I/O)

If you are really sure you want UTF-16, you could also do it with putchar() like the example above. Personally I would stay away from printf() because there is no guarantee the library will treat multi-byte characters correctly.

Dan Maas
Saturday, March 29, 2003

*  Recent Topics

*  Fog Creek Home