I think sweh and ewx are on the money here. It is incorrect to say that Unix only regards files as a stream of bytes. Sure, at the C programming level yes. At the shell programming level (using the provided utilities), then no. Mostly files are regarded as streams of characters encoded according to LC_CTYPE etc. For example sort(1) sorts according to LC_CTYPE (for encoding) and LC_LOCALE (for collation order). sort(1) cannot be told to "see the bytes", it only sees the characters.
Unix has a protocol for specifying the encoding of a file and it is generally to set the relevant environment variables.
(no subject)
Date: 2007-12-19 12:46 pm (UTC)Unix has a protocol for specifying the encoding of a file and it is generally to set the relevant environment variables.
Python is not a good Unix citizen in this regard.