The problem:
a = 'âêîôŷ'
print a
gives this error:
UnicodeDecodeError: 'ascii', '\xc3\xa2\xc3\xaa\xc3\xae\xc3\xb4\xc5\xb7', 0, 1, 'ordinal not in range(128)'
The proper way to define a unicode string is this:
a = u'âêîôŷ'
print a
which yields:
âêîôŷ'
In my search, though, there was lots of talk of how to convert strings to UTF-8, and this is *not* what you do if you want to write to a UTF-8 file. If you convert to UTF-8 before writing, you'll probably get errors becasue it'll contain values >=127.
This is how you do it...
ascii='abcdef'
uni = u'⢸ðêƒ'
file=codecs.open('utf-8.xml', mode='w', encoding='utf-8')
file.write(ascii)
file.write(uni)
file.close()
The only difference here is that you must use 'u' when defining literals, and you need to used codecs.open, with the encoding specified, when opening the file.
If, when you read the file, it appears to have 2 strange characters rather than the one unicode character you expect, the file is probably OK, it's the viewer that isn't reading UTF-8 properly.
No comments:
Post a Comment