ewx: (geek)
Richard Kettlewell ([personal profile] ewx) wrote2007-12-15 04:41 pm
Entry tags:

Stupid Python

chymax$ python -V
Python 2.5.1
chymax$ python -c 'print u"\xA9";'
©
chymax$ python -c 'print u"\xA9";' >/dev/null
Traceback (most recent call last):
  File "<string>", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128)
chymax$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="en_GB.utf-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C/en_GB.utf-8/C/C/C/C"
cjwatson: (Default)

[personal profile] cjwatson 2007-12-18 10:20 am (UTC)(link)

This seems to restore things to sanity, though I'm not convinced I've got the decode nonsense right (however, it works for me):

import sys

# Avoid having to do .encode('UTF-8') everywhere. This is a pain; I wish
# Python supported something like "sys.stdout.encoding = 'UTF-8'".
def fix_stdout():
    import codecs
    sys.stdout = codecs.EncodedFile(sys.stdout, 'UTF-8')
    def null_decode(input, errors='strict'):
        return input, len(input)
    sys.stdout.decode = null_decode

fix_stdout()
ext_8103: (geek)

[identity profile] ewx.livejournal.com 2007-12-18 10:24 am (UTC)(link)
A step in the right direction, but doesn't support non-UTF-8 locales l-(
cjwatson: (Default)

[personal profile] cjwatson 2007-12-18 10:42 am (UTC)(link)
Should be sufficient to replace 'UTF-8' with locale.getpreferredencoding() (not tested).