I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:
MemoryError Traceback (most recen
Here follows an example:
chunkTemp = []
queryTemp = []
query = pd.DataFrame()
for chunk in pd.read_csv(file, header=0, chunksize=, iterator=True, low_memory=False):
#REPLACING BLANK SPACES AT COLUMNS' NAMES FOR SQL OPTIMIZATION
chunk = chunk.rename(columns = {c: c.replace(' ', '') for c in chunk.columns})
#YOU CAN EITHER:
#1)BUFFER THE CHUNKS IN ORDER TO LOAD YOUR WHOLE DATASET
chunkTemp.append(chunk)
#2)DO YOUR PROCESSING OVER A CHUNK AND STORE THE RESULT OF IT
query = chunk[chunk[].str.startswith()]
#BUFFERING PROCESSED DATA
queryTemp.append(query)
#! NEVER DO pd.concat OR pd.DataFrame() INSIDE A LOOP
print("Database: CONCATENATING CHUNKS INTO A SINGLE DATAFRAME")
chunk = pd.concat(chunkTemp)
print("Database: LOADED")
#CONCATENATING PROCESSED DATA
query = pd.concat(queryTemp)
print(query)