How to read a single character at a time from a file in Python?

后端 未结 12 738
萌比男神i
萌比男神i 2020-11-28 05:49

Can anyone tell me how can I do this?

12条回答
  •  时光取名叫无心
    2020-11-28 06:26

    I like the accepted answer: it is straightforward and will get the job done. I would also like to offer an alternative implementation:

    def chunks(filename, buffer_size=4096):
        """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
        until no more characters can be read; the last chunk will most likely have
        less than `buffer_size` bytes.
    
        :param str filename: Path to the file
        :param int buffer_size: Buffer size, in bytes (default is 4096)
        :return: Yields chunks of `buffer_size` size until exhausting the file
        :rtype: str
    
        """
        with open(filename, "rb") as fp:
            chunk = fp.read(buffer_size)
            while chunk:
                yield chunk
                chunk = fp.read(buffer_size)
    
    def chars(filename, buffersize=4096):
        """Yields the contents of file `filename` character-by-character. Warning:
        will only work for encodings where one character is encoded as one byte.
    
        :param str filename: Path to the file
        :param int buffer_size: Buffer size for the underlying chunks,
        in bytes (default is 4096)
        :return: Yields the contents of `filename` character-by-character.
        :rtype: char
    
        """
        for chunk in chunks(filename, buffersize):
            for char in chunk:
                yield char
    
    def main(buffersize, filenames):
        """Reads several files character by character and redirects their contents
        to `/dev/null`.
    
        """
        for filename in filenames:
            with open("/dev/null", "wb") as fp:
                for char in chars(filename, buffersize):
                    fp.write(char)
    
    if __name__ == "__main__":
        # Try reading several files varying the buffer size
        import sys
        buffersize = int(sys.argv[1])
        filenames  = sys.argv[2:]
        sys.exit(main(buffersize, filenames))
    

    The code I suggest is essentially the same idea as your accepted answer: read a given number of bytes from the file. The difference is that it first reads a good chunk of data (4006 is a good default for X86, but you may want to try 1024, or 8192; any multiple of your page size), and then it yields the characters in that chunk one by one.

    The code I present may be faster for larger files. Take, for example, the entire text of War and Peace, by Tolstoy. These are my timing results (Mac Book Pro using OS X 10.7.4; so.py is the name I gave to the code I pasted):

    $ time python so.py 1 2600.txt.utf-8
    python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total
    $ time python so.py 4096 2600.txt.utf-8
    python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total
    

    Now: do not take the buffer size at 4096 as a universal truth; look at the results I get for different sizes (buffer size (bytes) vs wall time (sec)):

       2 2.726 
       4 1.948 
       8 1.693 
      16 1.534 
      32 1.525 
      64 1.398 
     128 1.432 
     256 1.377 
     512 1.347 
    1024 1.442 
    2048 1.316 
    4096 1.318 
    

    As you can see, you can start seeing gains earlier on (and my timings are likely very inaccurate); the buffer size is a trade-off between performance and memory. The default of 4096 is just a reasonable choice but, as always, measure first.

提交回复
热议问题