PHP Unicode FUD
I'm not sure how "almost complete ignorance of character encoding issues" is defined, but PHP can handle input and output of UTF-8 characters with its utf8_encode() [ http://www.php.net/utf8-encode ] and utf8_decode() [ http:/www.php.net/utf8-decode ] functions. Additionally, the mbstring extension [ http://www.php.net/mbstring ] handles decoding of multibyte characters coming from external input, as well as multibyte-aware string handling functions.
Funny. I wrote nearly the same some threads below, no reaction so far.
Maybe Joel's annoyance is that PHP strings are not natively Unicode. The PHP programmer must go out of his way to call utf8_decode(), which decodes UTF-8 to ASCII (ISO 8859-1). As Joel pointed out in his article, you cannot use ASCII to represent many Unicode characters. How is converting UTF-8 to ASCII helpful?
Neither PHP nor Python ever claimed to have transparent Unicode handling. As was pointed out in another thread, the programmer is responsible for providing Unicode support by using the methods described above. All the others can happily use Unicode-unaware string methods.
So when you decode a UTF 8 Japanese character, exactly what is the iso-8859-1 representation?
Brad Wilson (dotnetguy.techieswithcats.com)
Um, Python has what I'd consider "transparent" Unicode handling. I can manipulate Unicode values, and encode them to (or decode them from) any supported encoding, including all the various UTF/UCS/ISO encodings, or any custom encoding I care to add.
Phillip J. Eby
Hrmm, seems my remark that this forum doesn't handle Unicode properly got "modded down". I am amused.
Fog Creek Home