![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
rjk@rackheath:~$ python -c 'print "møøse"' møøse rjk@rackheath:~$ python -c 'print u"møøse"' møøse rjk@rackheath:~$ locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= rjk@rackheath:~$ python -V Python 2.5.2
(...but it's the same in 2.6.x.)
I think it's interpreting each byte in the (actually UTF-81) input string as a Unicode code point in its own right, though another (philosophically different but pragmatically essentially identical) possibility is that it thinks all input strings are encoded using ISO-8859-1.
1 and don't give me the nonsense someone came up with last time about Python having no way to know how the input is encoded. It does have a way, that's what LC_CTYPE is for.
(no subject)
Date: 2009-06-19 12:38 pm (UTC)Source files do need some means other than locale, yes, and PEP 0263 doesn't look too bad. But -c should obviously honor LC_CTYPE.
You'd see that it stops honoring LC_CTYPE for output if you redirected output anywhere.