Python: How to read huge text file into memory

前端未结

关注

 6  1055

自闭症患者 2020-11-29 21:36

I\'m using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file

$ ls -l links.csv; file links.csv; tail links.csv 
-rw-r--r--  1 user  u


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   心在旅途
                                             
                
                
                (楼主)
            
              
              
                2020-11-29 22:38
              

            
            
                        
All python objects have a memory overhead on top of the data they are actually storing.  According to getsizeof on my 32 bit Ubuntu system a tuple has an overhead of 32 bytes and an int takes 12 bytes, so each row in your file takes a 56 bytes + a 4 byte pointer in the list - I presume it will be a lot more for a 64 bit system.  This is in line with the figures you gave and means your 30 million rows will take 1.8 GB.

I suggest that instead of using python you use the unix sort utility.  I am not a Mac-head  but I presume the OS X sort options are the same the linux version, so this should work:

sort -n -t, -k2 links.csv


-n means sort numerically

-t, means use a comma as the field separator

-k2 means sort on the second field

This will sort the file and write the result to stdout.  You could redirect it to another file or pipe it to you python program to do further processing.

edit:
If you do not want to sort the file before you run your python script, you could use the subprocess module to create a pipe to the shell sort utility, then read the sorted results from the output of the pipe.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复