Janelle Monae - Tightrope
Mr Deity and the Woman
Lucky Louie - Ass (Audio NSFW)
Louis CK learns about the Catholic Church (Audio NSFW)
Tim Minchin - Storm (Audio NSFW)
Tim Minchin - Pope Song (NSFW. May offend you if you're Catholic)
Old Spice | The Man Your Man Could Smell Like
Sesame Street: Smell Like A Monster
The Western Nostril
Яolcats
http://devour.com
The Trachtenburg Family Slideshow Players - Mountain Trip
Charlie and the Swiss Chocolate Factory
XKCD Sys Admin
Sunday, 28 November 2010
Saturday, 27 November 2010
Writing UTF-8 files from Python
As always, there may be better ways to do this (using XML libraries, for example), but it took far too long to figure this out, given that there's so little to do to fix the problem. I found a lot of the examples found on Google didn't answer this specifically, but just added to the confusion.
The problem:
gives this error:
The proper way to define a unicode string is this:
which yields:
In my search, though, there was lots of talk of how to convert strings to UTF-8, and this is *not* what you do if you want to write to a UTF-8 file. If you convert to UTF-8 before writing, you'll probably get errors becasue it'll contain values >=127.
This is how you do it...
The only difference here is that you must use 'u' when defining literals, and you need to used codecs.open, with the encoding specified, when opening the file.
If, when you read the file, it appears to have 2 strange characters rather than the one unicode character you expect, the file is probably OK, it's the viewer that isn't reading UTF-8 properly.
The problem:
a = 'âêîôŷ'
print a
gives this error:
UnicodeDecodeError: 'ascii', '\xc3\xa2\xc3\xaa\xc3\xae\xc3\xb4\xc5\xb7', 0, 1, 'ordinal not in range(128)'
The proper way to define a unicode string is this:
a = u'âêîôŷ'
print a
which yields:
âêîôŷ'
In my search, though, there was lots of talk of how to convert strings to UTF-8, and this is *not* what you do if you want to write to a UTF-8 file. If you convert to UTF-8 before writing, you'll probably get errors becasue it'll contain values >=127.
This is how you do it...
ascii='abcdef'
uni = u'⢸ðêƒ'
file=codecs.open('utf-8.xml', mode='w', encoding='utf-8')
file.write(ascii)
file.write(uni)
file.close()
The only difference here is that you must use 'u' when defining literals, and you need to used codecs.open, with the encoding specified, when opening the file.
If, when you read the file, it appears to have 2 strange characters rather than the one unicode character you expect, the file is probably OK, it's the viewer that isn't reading UTF-8 properly.
Friday, 26 November 2010
More fun with Python
I'd recently written a little Python app to create a load of test data. The test data is XML, and should be UTF-8. I'd not really considered this properly, and for my original purposes, it's irrelevant. For a bit of fun/experimentation/learning, I put a tk front end on it, and email ed the project team to let them know, just in case it was useful.
Coincidentally, the vendor of the external product that would be producing these XML files in the real world was going to be late by several months, meaning that the XML files would need to be hand-crafted, the test data generator turned into a deliverable, and I briefly turned from tester into nightCoder.
There were a number of feature requests. My testing colleague started testing and raising defects against my code. Testing revealed areas in which i could be improved. I added a log file, properties files, some exception handling and error reporting dialogs. I had a real developer moment when it was deployed, went wrong and said (with a tester's smile on my face), "well that doesn't happen on my machine!".
While trying to figure out that problem, using a Swiss keyboard, typing garbage into some mandatory field, committing yielded another error as a result of non ASCII characters. As the client is Swiss, and these fields will probably included non ASCII, a fix was definitely required. Had this been just a learning exercise, I may not have been too worried, As I was now delivering this software, I had no option other than to figure it out. This highlights my main problem with self-teaching: I really struggle to find projects, and often abandon them in an unfinished state because nobody is relying on the solution.
Coincidentally, the vendor of the external product that would be producing these XML files in the real world was going to be late by several months, meaning that the XML files would need to be hand-crafted, the test data generator turned into a deliverable, and I briefly turned from tester into nightCoder.
There were a number of feature requests. My testing colleague started testing and raising defects against my code. Testing revealed areas in which i could be improved. I added a log file, properties files, some exception handling and error reporting dialogs. I had a real developer moment when it was deployed, went wrong and said (with a tester's smile on my face), "well that doesn't happen on my machine!".
While trying to figure out that problem, using a Swiss keyboard, typing garbage into some mandatory field, committing yielded another error as a result of non ASCII characters. As the client is Swiss, and these fields will probably included non ASCII, a fix was definitely required. Had this been just a learning exercise, I may not have been too worried, As I was now delivering this software, I had no option other than to figure it out. This highlights my main problem with self-teaching: I really struggle to find projects, and often abandon them in an unfinished state because nobody is relying on the solution.
Subscribe to:
Posts (Atom)