How can I read large text files in Python, line by line, without loading it into memory?

前端 未结 15 1501
臣服心动
臣服心动 2020-11-22 03:32

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() bec

15条回答
  •  不知归路
    2020-11-22 04:14

    Heres the code for loading text files of any size without causing memory issues. It support gigabytes sized files

    https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

    download the file data_loading_utils.py and import it into your code

    usage

    import data_loading_utils.py.py
    file_name = 'file_name.ext'
    CHUNK_SIZE = 1000000
    
    
    def process_lines(data, eof, file_name):
    
        # check if end of file reached
        if not eof:
             # process data, data is one single line of the file
    
        else:
             # end of file reached
    
    data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
    

    process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

    You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

提交回复
热议问题