There is a good number of questions about this error, but after looking around I\'m still not able to find/wrap my mind around a solution yet. I\'m trying to pivot a data fr
There are several ways.
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()
df1.columns=df1.columns.tolist()
print (df1)
df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
df1.columns=df1.columns.tolist()
print (df1)
Same ans.
id contact_id Network_Name City State Trip_End_Location
0 16 137519 2206 None Ca None
1 17 137520 2206 None Ca None
2 18 137521 2206 None Ca None
3 19 137522 2206 None Ca None
4 20 137523 2208 Lancaster None None
5 21 137524 2208 Lancaster None None
6 22 137525 2208 Lancaster None None
7 23 137526 2208 Lancaster None None
8 24 137527 2208 None None Home
9 25 137528 2208 None None Home
10 26 137529 2208 None None Home
11 27 137530 2208 None None Home
The default aggfunc in pivot_table is np.sum and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:
pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
columns='question',
values='response_answer',
aggfunc=lambda x: ' '.join(x))
This explicitly sets one row per id, contact_id pair and pivots the set of response_answer values on question. The aggfunc just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table might vary depending on your pandas version.
Here's a quick example:
In [24]: import pandas as pd
In [25]: import random
In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})
In [27]: df.head()
Out[27]:
id question response
0 3100 1 116
1 4500 2 113
2 5000 1 120
3 3900 2 103
4 4300 0 117
In [28]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id 100 non-null int64
question 100 non-null object
response 100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB
In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
Out[29]:
question 0 1 2 3
id
1000 110 120 NaN 100 NaN
1100 NaN 106 108 104 NaN
1200 104 113 119 NaN 101
1300 102 NaN 116 108 120
1400 NaN NaN 116 NaN