Python: How to read huge text file into memory

前端 未结 6 1055
自闭症患者
自闭症患者 2020-11-29 21:36

I\'m using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file

$ ls -l links.csv; file links.csv; tail links.csv 
-rw-r--r--  1 user  u         


        
6条回答
  •  心在旅途
    2020-11-29 22:38

    All python objects have a memory overhead on top of the data they are actually storing. According to getsizeof on my 32 bit Ubuntu system a tuple has an overhead of 32 bytes and an int takes 12 bytes, so each row in your file takes a 56 bytes + a 4 byte pointer in the list - I presume it will be a lot more for a 64 bit system. This is in line with the figures you gave and means your 30 million rows will take 1.8 GB.

    I suggest that instead of using python you use the unix sort utility. I am not a Mac-head but I presume the OS X sort options are the same the linux version, so this should work:

    sort -n -t, -k2 links.csv
    

    -n means sort numerically

    -t, means use a comma as the field separator

    -k2 means sort on the second field

    This will sort the file and write the result to stdout. You could redirect it to another file or pipe it to you python program to do further processing.

    edit: If you do not want to sort the file before you run your python script, you could use the subprocess module to create a pipe to the shell sort utility, then read the sorted results from the output of the pipe.

提交回复
热议问题