问题
I am looking to reshape the data from an excel sheet using Python. This is how my data looks
AuditDate Fields ModifiedBy
1/1/2019 7:58 Status: Assigned (0)
Site Group: XXX
Region: xxx
Site: xxxxx
Summary: xxxx
Location Company: xxx
Support Organization: XXXX
Support Group Name: xxxxx
Last Name: xxxx
First Name: xxxx
Categorization Tier 1:
Categorization Tier 2:
Categorization Tier 3:
Company: xxxx
Priority: xxx
Work Order Type: xxx
Company3: xxxxx
Request Manager:
Product Cat Tier 1(2):
Product Cat Tier 2 (2):
Product Cat Tier 3 (2):
ASORG: IT Shoreside
ASCPY: xxxx
ASGRP: xxx
Request Assignee:
Status History: XXXX XXXX
1/1/2019 8:31 Request Assignee: XXXX XXXX
1/1/2019 15:02 Status: Pending (1) XXXX
1/3/2019 13:00 Status: Completed (5) XXXX
1/9/2019 2:46 Status: Closed (8) XXXX
So if you see above the the first row is a multiline where data before colon(:) is to converted to columns.
Among here from FieldsChanged I am just concerned with Status, Priority, Request Assignee and ASGRP which i want to convert into columns. The output result will look like this
AuditDate Status Priority RequestAssignee ASGRP ModifiedBy
1/1/2019 7:58 Assigned XX XXX XXX XXXX
1/1/2019 8:31 XXXX XXXX
1/1/2019 15:02 Pending XXXX
1/3/2019 13:00 Completed XXXX
1/9/2019 2:46 Closed XXXX
The same data can be present in other rows as well. After reshaping the data this is how excel should look.
I would greatly appreciated if someone can help
回答1:
I will assume that the sheet has been converted to a csv file. So, you can use the csv module to first parse the rows and then parse the Fields field. And you can directly use the same csv module to directly build the result csv file.
Assuming that the input csv file is (note the quotes around the multiline field):
AuditDate,Fields,ModifiedBy
1/1/2019 7:58,"Status: Assigned (0)
Site Group: XXX
Region: xxx
Site: xxxxx
Summary: xxxx
Location Company: xxx
Support Organization: XXXX
Support Group Name: xxxxx
Last Name: xxxx
First Name: xxxx
Categorization Tier 1:
Categorization Tier 2:
Categorization Tier 3:
Company: xxxx
Priority: xxx
Work Order Type: xxx
Company3: xxxxx
Request Manager:
Product Cat Tier 1(2):
Product Cat Tier 2 (2):
Product Cat Tier 3 (2):
ASORG: IT Shoreside
ASCPY: xxxx
ASGRP: xxx
Request Assignee:
Status History: XXXX",XXXX
1/1/2019 8:31,Request Assignee: XXXX,XXXX
1/1/2019 15:02,Status: Pending (1),XXXX
1/3/2019 13:00,Status: Completed (5),XXXX
1/9/2019 2:46,Status: Closed (8),XXXX
You can easily process it that way:
with open('input.csv', newline='') as fd, open('output.csv', 'w', newline='') as fdout:
rd = csv.DictReader(fd) # directly use a DictReader for reading
# declare a DictWriter for the required fields ignoring any additional field (extrasaction)
wr = csv.DictWriter(fdout, ['AuditDate', 'Status', 'Priority', 'Request Assignee',
'ASGRP', 'ModifiedBy'], extrasaction='ignore')
wr.writeheader() # write the headers
for row in rd:
with io.StringIO(row['Fields']) as ffd: # process Fields
frd = csv.reader(ffd,delimiter=':', skipinitialspace=True)
row.update(dict(frd)) # update the row dictionary with the "sub-fields"
_ = wr.writerow(row) # and directly use that
You should get as expected:
AuditDate,Status,Priority,Request Assignee,ASGRP,ModifiedBy
1/1/2019 7:58,Assigned (0),xxx,,xxx,XXXX
1/1/2019 8:31,,,XXXX,,XXXX
1/1/2019 15:02,Pending (1),,,,XXXX
1/3/2019 13:00,Completed (5),,,,XXXX
1/9/2019 2:46,Closed (8),,,,XXXX
回答2:
I would suggest usage of the pandas library. This follows an intuitive table style format (similar to excel)
import pandas as pd
pd.read_excel('tmp.xlsx', index_col=0)
You can then filter and reshape the read dataframe (table) as required or drop rows with na (ie using the audit date column).
来源:https://stackoverflow.com/questions/54789589/convert-multiline-excel-data-into-column-and-rows-using-python