问题
I am trying to create a pivot table that joins multiple (more than 8) data frames into one.
The tables have multiple columns, but I'll keep it simple here:
Table1
week project
42 ABC
42 FGA
42 ZTR
44 HTZ
44 UZR
44 LOP
46 POL
46 ZTT
46 ART
46 ART
...
In some weeks there may be not any occurrence of any project. Table 2, 3, 4 and so on will certainly have a different number of weekly occurrences.
The only common column accross all tables it the week column. Some tables have more some less columns, also column headers may vary. The week column is the only common column across all and, as I assume, is sufficient here to be used solely.
My goal is to count the number of occurrences across all tables per week. Ultimately, what I'd like to achieve is:
index table1 table2 table3 table4 table5
42 3 3 4 11 23
43 0 4 10 15 7
44 3 12 8 9 1
45 0 7 0 0 8
46 4 6 7 0 22
47 8 3 12 6 0
Such counting would be quite easy in excel, simply by using a pivottable with counting. How would I proceed in such scenario in Python?
回答1:
You can use concat with the keys argument and a follow up groupby with unstack.
The thing to note here is that your inferring the key manually, it would be better if each table had an id to show what source it came from.
tables = [df1,df2] # if you want to make the keys dynamic,
#tables = table_dict = dict(zip([f'table {i}' for i in range(1,len(tables) + 1)],tables))
df_new = (
pd.concat(tables, axis=0, keys=["table1", "table2"])
.set_index("week", append=True)
.groupby(level=[0, 2])
.count()
.unstack(0)
)
project
table1 table2
week
42 3 3
44 3 3
46 4 4
来源:https://stackoverflow.com/questions/65322673/pandas-pivoting-multiple-tables-into-single-and-counting-occurences