I am interfacing with a Java application via Python. I need to be able to construct byte sequences which contain UTF-8 strings. Java uses a modified UTF-8 encoding in
Okay, if you need to read the format of DataInput.readUTF, I suspect you'll just have to convert the (well-documented) format into Python.
It doesn't look like it would be particularly hard to do. After reading the length and then the binary data itself, I suggest you use a first pass to work out how many Unicode characters will be in the output, then construct a string accordingly in a second pass. Without knowing Python I don't know the ins and outs of how to efficiently construct a string, but given the linked specification I can't imagine it would be very hard. You might want to look at the source for the existing UTF-8 decoder as a starting point.