问题
My original question was posted here. I have a dataframe as follows:
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
I wanted to transform it into this DataFrame:
ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA
After pivot_table
transformations I received a DataFrame that has an additional blank row after the header:
test_2['SEQ1'] = test_2.SEQ
test_2 = test_2.pivot_table(index= ['ID','SEQ1']).unstack()
test_2 = test_2.sort_index(axis=1, level=1)
test_2.columns = ['_'.join((col[0], str(col[1]))) for col in test_2]
test_2
test_2
START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
ID
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA
How can I delete these row and align all the headers? I tried to delte the row in a conventional way using test2[:2]
, but it didn't delete the blank row.
EDIT:
This is the more realistic dataset's extract:
ID INDEX START END SEQ NUM_PREV NUM_ACTUAL NUM_NEXT TIME PRE_TIME LOC_IND
079C 333334.0 2016-06-23 12:45:32 2016-06-23 12:51:05 1 1 23456 25456 29456 30 2 YES
079C 333334.0 2016-06-23 12:47:05 2016-06-23 12:51:05 2 2 29456 39458 39945 20 0 NO
回答1:
Consider resetting index after the pivot/unstack operation:
from io import StringIO
import pandas as pd
data='''
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
'''
test_2 = pd.read_table(StringIO(data), sep="\\s+")
seq = set(test_2['SEQ'].tolist())
test_2['SEQ1'] = test_2.SEQ
test_2 = test_2.pivot_table(index= ['ID','SEQ1']).unstack()
test_2 = test_2.sort_index(axis=1, level=1)
test_2.columns = ['_'.join((col[0], str(col[1]))) for col in test_2]
test_2 = test_2.reset_index()
# ID END_1 SEQ_1 START_1 END_2 SEQ_2 START_2 END_3 SEQ_3 START_3
# 0 1 12.0 1.0 11.0 14.0 2.0 13.0 15.0 3.0 14.0
# 1 2 14.0 1.0 10.0 NaN NaN NaN NaN NaN NaN
# 2 3 15.0 1.0 11.0 17.0 2.0 16.0 NaN NaN NaN
However, as you can see it changes column ordering, so consider a nested list comprehension with sum()
to flatten it, all for a suitable order:
seqmax = max(seq)+1
colorder = ['ID'] + sum([['START_'+str(i),'END_'+str(i),'SEQ_'+str(i)]
for i in range(1, seqmax) if i in seq],[])
test_2 = test_2[colorder]
# ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
# 0 1 11.0 12.0 1.0 13.0 14.0 2.0 14.0 15.0 3.0
# 1 2 10.0 14.0 1.0 NaN NaN NaN NaN NaN NaN
# 2 3 11.0 15.0 1.0 16.0 17.0 2.0 NaN NaN NaN
来源:https://stackoverflow.com/questions/40723561/transformation-of-pandas-dataframe-adds-a-blank-row