ewx: (geek)
[personal profile] ewx
rjk@rackheath:~$ python -c 'print "møøse"'
møøse
rjk@rackheath:~$ python -c 'print u"møøse"'
møøse
rjk@rackheath:~$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
rjk@rackheath:~$ python -V
Python 2.5.2

(...but it's the same in 2.6.x.)

I think it's interpreting each byte in the (actually UTF-81) input string as a Unicode code point in its own right, though another (philosophically different but pragmatically essentially identical) possibility is that it thinks all input strings are encoded using ISO-8859-1.

1 and don't give me the nonsense someone came up with last time about Python having no way to know how the input is encoded. It does have a way, that's what LC_CTYPE is for.

(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org

February 2025

S M T W T F S
      1
2345678
9101112131415
16171819202122
232425262728 

Most Popular Tags

Expand Cut Tags

No cut tags