Wrap an open stream with io.TextIOWrapper

后端 未结 6 1667
太阳男子
太阳男子 2020-12-05 13:16

How can I wrap an open binary stream – a Python 2 file, a Python 3 io.BufferedReader, an io.BytesIO – in an io.TextIOWrapper

相关标签:
6条回答
  • 2020-12-05 13:52

    Use codecs.getreader to produce a wrapper object:

    text_stream = codecs.getreader("utf-8")(bytes_stream)
    

    Works on Python 2 and Python 3.

    0 讨论(0)
  • 2020-12-05 13:53

    I needed this as well, but based on the thread here, I determined that it was not possible using just Python 2's io module. While this breaks your "Special treatment for file" rule, the technique I went with was to create an extremely thin wrapper for file (code below) that could then be wrapped in an io.BufferedReader, which can in turn be passed to the io.TextIOWrapper constructor. It will be a pain to unit test, as obviously the new code path can't be tested on Python 3.

    Incidentally, the reason the results of an open() can be passed directly to io.TextIOWrapper in Python 3 is because a binary-mode open() actually returns an io.BufferedReader instance to begin with (at least on Python 3.4, which is where I was testing at the time).

    import io
    import six  # for six.PY2
    
    if six.PY2:
        class _ReadableWrapper(object):
            def __init__(self, raw):
                self._raw = raw
    
            def readable(self):
                return True
    
            def writable(self):
                return False
    
            def seekable(self):
                return True
    
            def __getattr__(self, name):
                return getattr(self._raw, name)
    
    def wrap_text(stream, *args, **kwargs):
        # Note: order important here, as 'file' doesn't exist in Python 3
        if six.PY2 and isinstance(stream, file):
            stream = io.BufferedReader(_ReadableWrapper(stream))
    
        return io.TextIOWrapper(stream)
    

    At least this is small, so hopefully it minimizes the exposure for parts that cannot easily be unit tested.

    0 讨论(0)
  • 2020-12-05 13:57

    Here's some code that I've tested in both python 2.7 and python 3.6.

    The key here is that you need to use detach() on your previous stream first. This does not close the underlying file, it just rips out the raw stream object so that it can be reused. detach() will return an object that is wrappable with TextIOWrapper.

    As an example here, I open a file in binary read mode, do a read on it like that, then I switch to a UTF-8 decoded text stream via io.TextIOWrapper.

    I saved this example as this-file.py

    import io
    
    fileName = 'this-file.py'
    fp = io.open(fileName,'rb')
    fp.seek(20)
    someBytes = fp.read(10)
    print(type(someBytes) + len(someBytes))
    
    # now let's do some wrapping to get a new text (non-binary) stream
    pos = fp.tell() # we're about to lose our position, so let's save it
    newStream = io.TextIOWrapper(fp.detach(),'utf-8') # FYI -- fp is now unusable
    newStream.seek(pos)
    theRest = newStream.read()
    print(type(theRest), len(theRest))
    

    Here's what I get when I run it with both python2 and python3.

    $ python2.7 this-file.py 
    (<type 'str'>, 10)
    (<type 'unicode'>, 406)
    $ python3.6 this-file.py 
    <class 'bytes'> 10
    <class 'str'> 406
    

    Obviously the print syntax is different and as expected the variable types differ between python versions but works like it should in both cases.

    0 讨论(0)
  • 2020-12-05 13:59

    It turns out you just need to wrap your io.BytesIO in io.BufferedReader which exists on both Python 2 and Python 3.

    import io
    
    reader = io.BufferedReader(io.BytesIO("Lorem ipsum".encode("utf-8")))
    wrapper = io.TextIOWrapper(reader)
    wrapper.read()  # returns Lorem ipsum
    

    This answer originally suggested using os.pipe, but the read-side of the pipe would have to be wrapped in io.BufferedReader on Python 2 anyway to work, so this solution is simpler and avoids allocating a pipe.

    0 讨论(0)
  • 2020-12-05 14:05

    Based on multiple suggestions in various forums, and experimenting with the standard library to meet the criteria, my current conclusion is this can't be done with the library and types as we currently have them.

    0 讨论(0)
  • 2020-12-05 14:08

    Okay, this seems to be a complete solution, for all cases mentioned in the question, tested with Python 2.7 and Python 3.5. The general solution ended up being re-opening the file descriptor, but instead of io.BytesIO you need to use a pipe for your test double so that you have a file descriptor.

    import io
    import subprocess
    import os
    
    # Example function, re-opens a file descriptor for UTF-8 decoding,
    # reads until EOF and prints what is read.
    def read_as_utf8(fileno):
        fp = io.open(fileno, mode="r", encoding="utf-8", closefd=False)
        print(fp.read())
        fp.close()
    
    # Subprocess
    gpg = subprocess.Popen(["gpg", "--version"], stdout=subprocess.PIPE)
    read_as_utf8(gpg.stdout.fileno())
    
    # Normal file (contains "Lorem ipsum." as UTF-8 bytes)
    normal_file = open("loremipsum.txt", "rb")
    read_as_utf8(normal_file.fileno())  # prints "Lorem ipsum."
    
    # Pipe (for test harness - write whatever you want into the pipe)
    pipe_r, pipe_w = os.pipe()
    os.write(pipe_w, "Lorem ipsum.".encode("utf-8"))
    os.close(pipe_w)
    read_as_utf8(pipe_r)  # prints "Lorem ipsum."
    os.close(pipe_r)
    
    0 讨论(0)
提交回复
热议问题