Why does concatenation of DataFrames get exponentially slower?
问题 I have a function which processes a DataFrame, largely to process data into buckets create a binary matrix of features in a particular column using pd.get_dummies(df[col]) . To avoid processing all of my data using this function at once (which goes out of memory and causes iPython to crash), I have broken the large DataFrame into chunks using: chunks = (len(df) / 10000) + 1 df_list = np.array_split(df, chunks) pd.get_dummies(df) will automatically create new columns based on the contents of