问题
Update down below!
I am trying to merge and sort a list of IDs and their connected unique Name_ID, separated by semicolons. For example:
Name_ID Adress_ID Name_ID Adress_ID
Name1 5875383 Name1 5875383; 5901847
Name1 5901847 Name2 5285200
Name2 5285200 to Name3 2342345; 6463736
Name3 2342345
Name3 6463736
This is my current code:
origin_file_path = Path("Folder/table.xlsx")
dest_file_path = Path("Folder/table_sorted.xlsx")
table = pd.read_excel(origin_file_path)
df1 = pd.DataFrame(table)
df1 = df1.groupby('Name_ID').agg(lambda x: x.tolist())
df1.to_excel(dest_file_path, sheet_name="Adress_IDs")
But it exports it like this to the excel file:
Name_ID Adress_ID
Name1 [5875383, 5901847]
Can someone tell me what the best way would be to get rid of the list format and separate by semicolons instead of commas?
Update:
The user Jezrael linked me this thread. But I can't seem to be able to combine ';'.join
with lambda x: x.tolist()
.
df1 = df1.groupby('Kartenname').agg(';'.join, lambda x: x.tolist())
Produces TypeError: join() takes exactly one argument (2 given)
df1 = df1.groupby('Kartenname').agg(lambda x: x.tolist(), ';'.join)
Produces TypeError: () takes 1 positional argument but 2 were given.
I also tried other Combinations but none seem to even execute properly. Getting rid of the lambda function isn't an option because then it just pastes Name_ID Adress_ID a thousand times instead of the correct Name and correct IDs.
回答1:
You can pass to agg
function tuples with new column names with aggregate functions:
df['Adress_ID'] = df['Adress_ID'].astype(str)
df1 = df.groupby('Name_ID')['Adress_ID'].agg([('a', ';'.join),
('b', lambda x: x.tolist())])
print (df1)
a b
Name_ID
Name1 5875383;5901847 [5875383, 5901847]
Name2 5285200 [5285200]
Name3 2342345;6463736 [2342345, 6463736]
If pass only aggregate functions in list (no tuples) get default columns names:
df2 = df.groupby('Name_ID')['Adress_ID'].agg([ ';'.join,lambda x: x.tolist()])
print (df2)
join <lambda_0>
Name_ID
Name1 5875383;5901847 [5875383, 5901847]
Name2 5285200 [5285200]
Name3 2342345;6463736 [2342345, 6463736]
回答2:
first you need to make sure the Address_ID
is string
then you can apply this func:
df.groupby('Name_ID').agg(lambda x: ':'.join(list(x.values)))
more about 'str'.join
method
回答3:
- The main issue
- Can't
join
anint
- Can't
Name_ID Adress_ID
Name1 5875383
Name1 5901847
Name2 5285200
Name3 2342345
Name3 6463736
def fix_my_stuff(x):
x = x.tolist()
x = '; '.join([str(y) for y in x])
return(x)
df_updated = df.groupby('Name_ID').agg(lambda x: fix_my_stuff(x)).reset_index()
print(df_updated)
Name_ID Adress_ID
Name1 5875383; 5901847
Name2 5285200
Name3 2342345; 6463736
来源:https://stackoverflow.com/questions/58179593/how-to-combine-join-and-lambda-x-x-tolist-inside-an-groupby-agg-functio