I have a network resource which returns me data that should (according to the specs) be an ASCII encoded string. But in some rare occasions, I get junk data.
One resource for example returns b'\xd3PS-90AC'
whereas another resource, for the same key returns b'PS-90AC'
The first value contains a non-ASCII string. Clearly a violation of the spec, but that's unfortunately out of my control. None of us are 100% certain that this really is junk or data which should be kept.
The application calling on the remote resources saves the data in a local database for daily use. I could simply do a data.decode('ascii', 'replace')
or ..., 'ignore')
but then I would lose data which could turn out to be useful later on.
My immediate reflex was to use 'xmlcharrefreplace'
or 'backslashreplace'
as error handler. Simply because it would result in a displayable string. But then I get the following error: TypeError: don't know how to handle UnicodeDecodeError in error callback
The only error-handler which worked was surrogateescape
, but this seems to be intended for filenames. On the other hand, for my intent and purpose it would work.
Why are 'xmlcharrefreplace'
and 'backslashreplace'
not working? I don't understand the error.
For example, an expected execution would be:
>>> data = b'\xd3PS-90AC' >>> new_data = data.decode('ascii', 'xmlcharrefreplace') >>> print(repr(new_data)) '&#d3;PS-90AC'
This is a contrived example. My aim is to not lose any data. If I would use the ignore
or replace
error-handler, the byte in question would essentially disappear, and information is lost.