Sorting Excel column with Python

问题

Let's say I have a list like this:

time    type    value
80      1A      10
100     1A      20
60      18      56
80      18      7
80      2A      10
100     2A      10
80      28      10
100     28      20

and I need to change it to be like this:

            time        
type    60  80  100
1A          10  20
1B      56  7   
2A          10  10
2B          10  20

So far what I did is just basic sorting of the column:

target_column = 0
book = open_workbook('result.xls')
sheet = book.sheets()[0]
data = [sheet.row_values(i) for i in range(sheet.nrows)]
labels = data[0]
data = data[1:]
data.sort(key= lambda x: x[target_column])

bk = xlwt.Workbook()
sheet = bk.add_sheet(sheet.name)
for idx, label in enumerate(labels):
    sheet.write(0, idx, label)

for idx_r, row in enumerate(data):
    for idx_c, value in enumerate(row):
        sheet.write(idx_r+1, idx_c, value)

bk.save('resul.xls')

How can I it with Python?

回答1:

You can use pandas.DataFrame.pivot() to do that like:

Code:

df.pivot(index='type', columns='time', values='value')

Test Code:

df = pd.read_fwf(StringIO(u"""
    time    type    value
    80      1A      10
    100     1A      20
    60      18      56
    80      18      7
    80      2A      10
    100     2A      10
    80      28      10
    100     28      20"""), header=1)
print(df)

print(df.pivot(index='type', columns='time', values='value'))

Results:

   time type  value
0    80   1A     10
1   100   1A     20
2    60   18     56
3    80   18      7
4    80   2A     10
5   100   2A     10
6    80   28     10
7   100   28     20

time   60    80    100
type                  
18    56.0   7.0   NaN
1A     NaN  10.0  20.0
28     NaN  10.0  20.0
2A     NaN  10.0  10.0

回答2:

This is just a educational. Right answer is Pandas way by @Stephen Rauch

from xlrd import open_workbook
from openpyxl import Workbook


book = open_workbook('pivot.xls')
sheet = book.sheet_by_index(0)
pivot = {}
for row_index in range(1, sheet.nrows):
    time = sheet.cell(row_index, 0).value
    type = sheet.cell(row_index, 1).value
    value = sheet.cell(row_index, 2).value

    if type not in pivot:
        pivot[type] = {}
        pivot[type][time] = value
    else:
        pivot[type][time] = value
wb = Workbook()
ws1 = wb.active
ws1.merge_cells('B1:D1')
ws1.append(("", "time"))
ws1.append(("type", "60", "80", "100"))
for type, value in pivot.items():
    ws1.append((type, value.get(60, None), value.get(80, None), value.get(100, None)))
wb.save('out.xlsx')

回答3:

import pandas as pd
df = pd.read_excel('pivot.xls')
df_pivot = df.pivot(index='type', columns='time', values='value')
df_pivot.to_excel('output.xlsx')

来源：https://stackoverflow.com/questions/48982201/sorting-excel-column-with-python

标签

python

pandas

sorting

pivot