Spot any errors? let me know, but Unleash your pedant politely please.

Thursday 14 March 2013

UnicodeEncodeError | UnicodeDecodeError

I've been battling with UnicodeEn/DecodeErrors today. For bloody hours.

I'd looked at soap libraries ages ago, but didn't really get anywhere with them.  I've just been using soapUI to send manually altered requests as an when to a WebMethods service.  Now that I want to do more than scratch the surface, I want to do some data driven testing.  From poking around with soapUI this week, I know soapUI can do this, but as far as I can tell, I need to Pro version, which I can't quite justify at the moment.  I'd also need to learn a lot more about soapUI, Groovy, and the intricacies of the soapUI Groovy interface.  I probably should do this.  Right now though, I can't quite face it.

More recently, I've been using some HTTP libraries to communicate successfully with the API at Project Place.  This go me thinking that I should be able to send some SOAP requests using the in-favour requests using Python.  It took a little while, but I got this working with Currency Converter, using the sample requests, host and endpoint information for the same service in soapUI.

So I'm bypassing the WSDL and hand-crafting some messages. There are probably many reasons why this is a bad idea.  I'm more interested in getting some testing done though.

Once I'd got Currency Converter working with a set of test data…


    test_data = [{u'FromCurrency':u'CHF', u'ToCurrency':u'GBP', u'Expected':0.712},
                 {u'FromCurrency':u'GBP', u'ToCurrency':u'CHF', u'Expected':1.411},
                 {u'FromCurrency':u'USD', u'ToCurrency':u'GBP', u'Expected':0.671},
                 {u'FromCurrency':u'GBP', u'ToCurrency':u'USD', u'Expected':1.493},
                 ]

Using an XML library (also shamelessly schema free) I cooked up a couple of years ago, I can modify values in the tree…

        message.replace_value_at_path(path      = u'',
                                      new_value = test_datum[u"FromCurrency"])


I moved on to the WebMethods request.  This also required basic authentication, which turned out to be fine once I'd had yet another battle with some recalcitrant proxy settings.  Gradually it seemed to be coming together.  I got it working, and tidied up the code so that it was reasonably readable to my the less code-aware colleagues. One of the refinements was to read the template/sample message from a file rather than have the message inline. At that point, I tried a second file, just to confirm that it was all working.  That's when it broke.  I'd been fairly careful to use unicode throughout, but even so, I saw this as the result of an 'é' in the second file…


  File "C:\Python27\lib\httplib.py", line 990, in _send_request
    self.endheaders(body)
  File "C:\Python27\lib\httplib.py", line 943, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 810, in _send_output
    self.send(message_body)
  File "C:\Python27\lib\httplib.py", line 775, in send
    self.sock.sendall(str)
  File "C:\Python27\lib\socket.py", line 222, in meth
    return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 143-149: ordinal not in range(
128)


I can't find the page that solved this for me.  It took a fair bit of trawling through many Stack Overflow questions and answers to discover that I needed to do one small thing: encode as UTF8 on output.

      soap_request = soap_request.encode('utf-8')

Testing this on my Mac seemed to confirm that all was well - though the service wasn't available.  Testing on a Windows client that did have access to the service threw up this gem…


  File "/usr/local/lib/python2.7/httplib.py", line 946, in request
    self._send_request(method, url, body, headers)
  File "/usr/local/lib/python2.7/httplib.py", line 987, in _send_request
    self.endheaders(body)
  File "/usr/local/lib/python2.7/httplib.py", line 940, in endheaders
    self._send_output(message_body)
  File "/usr/local/lib/python2.7/httplib.py", line 801, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 484: ordinal not in range(128)


I spent a lot of time on this, including a bit of a wild goose chase with repr as a result of this comment.  I briefly thought I'd got it working, but instead of 'é', I saw '\xe9'.  At breaking point, I though I'd add a comment to the bug report in case one of the requests guys could help.  And it was in writing that comment that I finally found the answer.  I'd written this…

I'm also having trouble with this. It's fine on Mac (2.7.2), but not working on Windows (
At which point I checked the version on Windows. It was 2.7 dated 2010-07-04.  I thought that looked a bit old, so I installed latest Python version (2.7.3 2012-04-10) and the problem disappeared. It's a shame that I wasted so many hours trying to get it to work.

With all the fiddling, I managed to break some code.  I'm not using a proper repository, but Dropbox came to the rescue.  I made a copy of the current file, restored last night's version and managed to get it all working.

Now where I had '\xe9', I now have 'é', and I'm in a position to demo it to some colleagues tomorrow.

No comments:

Post a Comment