How do I read a large csv file with pandas?

后端 未结 15 2240
隐瞒了意图╮
隐瞒了意图╮ 2020-11-21 07:12

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:

MemoryError                               Traceback (most recen         


        
15条回答
  •  天命终不由人
    2020-11-21 07:28

    Here follows an example:

    chunkTemp = []
    queryTemp = []
    query = pd.DataFrame()
    
    for chunk in pd.read_csv(file, header=0, chunksize=, iterator=True, low_memory=False):
    
        #REPLACING BLANK SPACES AT COLUMNS' NAMES FOR SQL OPTIMIZATION
        chunk = chunk.rename(columns = {c: c.replace(' ', '') for c in chunk.columns})
    
        #YOU CAN EITHER: 
        #1)BUFFER THE CHUNKS IN ORDER TO LOAD YOUR WHOLE DATASET 
        chunkTemp.append(chunk)
    
        #2)DO YOUR PROCESSING OVER A CHUNK AND STORE THE RESULT OF IT
        query = chunk[chunk[].str.startswith()]   
        #BUFFERING PROCESSED DATA
        queryTemp.append(query)
    
    #!  NEVER DO pd.concat OR pd.DataFrame() INSIDE A LOOP
    print("Database: CONCATENATING CHUNKS INTO A SINGLE DATAFRAME")
    chunk = pd.concat(chunkTemp)
    print("Database: LOADED")
    
    #CONCATENATING PROCESSED DATA
    query = pd.concat(queryTemp)
    print(query)
    

提交回复
热议问题