I have a Pandas dataframe with 1000s of rows. and it has the Names column includes the customer names and their records. I want to create individual dataframes
To create a dataframe for all the unique values in a column, create a dict of dataframes, as follows.
dict, where each key is a unique value from the column of choice and the value is a dataframe.df_names['Name1'])k is the unique values in the column and v is the data associated with each k.for-loop and .groupby:df_names = dict()
for k, v in df.groupby('customer name'):
df_names[k] = v
.groupbydf_names = {k: v for (k, v) in df.groupby('customer name')}
.groupby is faster than .unique.
.groupby is faster, at 104 ms compared to 392 ms.groupby is faster, at 147 ms compared to 1.53 s.for-loop is slightly faster than a comprehension, particularly for more unique column values or lots of rows (e.g. 10M)..unique:df_names = {name: df[df['customer name'] == name] for name in df['customer name'].unique()}
import pandas as pd
import string
import random
random.seed(365)
# 6 unique values
data = {'class': [random.choice(['1-5', '6-25', '26-100', '100-500', '500-1000', '>1000']) for _ in range(1000000)],
'treatment': [random.choice(['Yes', 'No']) for _ in range(1000000)]}
# 26 unique values
data = {'class': [random.choice( list(string.ascii_lowercase)) for _ in range(1000000)],
'treatment': [random.choice(['Yes', 'No']) for _ in range(1000000)]}
df = pd.DataFrame(data)