Concatenating Duplicated Large Dataframes: MemoryError

问题

Follow up to: How can I reference the key in the Pandas dataframes within that dictionary?

The goal is still to forecast the revenue by fiscal year where I will break revenue into a new column according to how much will be garnered in each year. I have code (put together with some help) that pulls several dataframes into a single dataframe using a dictionary in which I've put them, duplicated except for the Fiscal Year column. These dataframes were then concatenated into one.

I've simplified my code to the below:

import pandas as pd
columns = ['ID','Revenue','Fiscal Year']
ID = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Revenue = [1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800]
FY = []
d = {'ID': ID, 'Revenue': Revenue}
df = pd.DataFrame(d)
df['Fiscal Year'] = ''

def df_dict_func(start, end, dataframe):
    date_range = range(start, end + 1)
    dataframe_dict = {}
    for n in date_range:
        sub = dataframe.copy()
        sub['Fiscal Year'] = n
        dataframe_dict[n] = sub
    return dataframe_dict    

df_dict = df_dict_func(2019, 2035, df)
df = pd.concat(df_dict)

The code works excellently for smaller datasets, but when I go to expand it to a large dataset, I receive a MemoryError. Is there a more efficient way to duplicate the results of the code while avoiding the MemoryError issue?

The error that I am getting is specifically "MemoryError" and it occurs right before I receive any result from my pd.concat command. Each of the dataframes within the dictionary are substantial in size (over 500MB).

来源：https://stackoverflow.com/questions/52896575/concatenating-duplicated-large-dataframes-memoryerror

标签

python

python-3.x

pandas

out-of-memory