Fog Creek Software
g
Discussion Board




PHP Unicode FUD

I'm not sure how "almost complete ignorance of character encoding issues" is defined, but PHP can handle input and output of UTF-8 characters with its utf8_encode() [ http://www.php.net/utf8-encode ] and utf8_decode() [ http:/www.php.net/utf8-decode ] functions. Additionally, the mbstring extension [ http://www.php.net/mbstring ] handles decoding of multibyte characters coming from external input, as well as multibyte-aware string handling functions.

There are plenty of improvements that could be made to PHP's Unicode support, some of which will be handled in PHP 5 [ http://www.zend.com/lists/engine2/200308/msg00013.html ] but saying that PHP has "almost complete ignorance of character encoding issues" is, IMHO, false.

David Sklar
Monday, October 13, 2003

Funny. I wrote nearly the same some threads below, no reaction so far.

Joel, will you please stand up?

bar
Monday, October 13, 2003

Maybe Joel's annoyance is that PHP strings are not natively Unicode. The PHP programmer must go out of his way to call utf8_decode(), which decodes UTF-8 to ASCII (ISO 8859-1). As Joel pointed out in his article, you cannot use ASCII to represent many Unicode characters. How is converting UTF-8 to ASCII helpful?

runtime
Monday, October 13, 2003

Neither PHP nor Python ever claimed to have transparent Unicode handling. As was pointed out in another thread, the programmer is responsible for providing Unicode support by using the methods described above. All the others can happily use Unicode-unaware string methods.

This is the brother of "customer choice".

bar
Monday, October 13, 2003

So when you decode a UTF 8 Japanese character, exactly what is the iso-8859-1 representation?

Brad Wilson (dotnetguy.techieswithcats.com)
Monday, October 13, 2003

������!!!!

runtime
Monday, October 13, 2003

Um, Python has what I'd consider "transparent" Unicode handling.  I can manipulate Unicode values, and encode them to (or decode them from) any supported encoding, including all the various UTF/UCS/ISO encodings, or any custom encoding I care to add.

AFAICT from the documentation linked, though, PHP doesn't support transparent Unicode handling, though. It looks like you have to use different functions for "multibyte" strings than regular strings, and you have to keep track of what encoding such strings are in, unless you have a default encoding, which is a global setting.  And if you have a global setting, what happens when two different modules want different settings?  Ugh.

It looks *possible* to do Unicode with PHP, but I'd hate to have to do it.

Phillip J. Eby
Monday, October 13, 2003

Hrmm, seems my remark that this forum doesn't handle Unicode properly got "modded down".  I am amused.

Alyosha`
Monday, October 13, 2003

*  Recent Topics

*  Fog Creek Home