Python - Convert bytes / unicode tab delimited data to csv file

一个人想着一个人 提交于 2020-01-24 18:59:06

问题


I'm pulling the following line of data from an API. The data starts with a b prefix which would indicate according to the Python 3.3 documentation that we are dealing with "a bytes literal" with the escape sequences \t and \n representing the ASCII Horizontal Tab (TAB) and ASCII Linefeed (LF) respectively.

b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

When I convert this data to a string using .decode("utf-8") I get the corresponding tab delimited data:

settlement-id   settlement-start-date   settlement-end-date deposit-date    total-amount    currency    transaction-type    order-id    merchant-order-id   adjustment-id   shipment-id marketplace-name    amount-type amount-description  amount  fulfillment-id  posted-date posted-date-time    order-item-code merchant-order-item-id  merchant-adjustment-item-id sku quantity-purchased
7293436482  03.05.2018 09:10:07 UTC 04.05.2018 20:30:23 UTC 06.05.2018 20:30:23 UTC 53,44   EUR                                                                 
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemPrice   Principal   179,99  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemFees    Commission  -32,40  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemPrice   Principal   -109,99 AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    Commission  19,80   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    RefundCommission    -3,96   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   

However, I cannot seem to save this data to a tab delimited csv file. I have tried several methods to save this data to a csv file all of which have failed including the following:

with open("folder_GET_V2_SETTLEMENT_REPORT_DATA_FLAT_FILE_V2_/" + grl_id + ".csv", "w") as csv_file:
    writer = csv.writer(csv_file)
    for row in csv_file:
        print(row)

Which gives me the following error:

    for row in csv_file:
io.UnsupportedOperation: not readable

Update: So it turns out the problem lies elsewhere. I had actually managed to generate the same file as you during my various tests thought it wasn't working as the output looked wrong. When opening the file in excel the data was split into two columns.

I have now figured out that the reason for that is there are some numbers using the european way of noting decimals which is a coma 179,99. Excel is therefore interpreting this as a delimiter whereas if I open the file in Notepad it reads correctly.


回答1:


Well you are getting the error because you wish to write the data to the csv file but in the for loop you are trying to read from the file. If I understand correctly, you wish to take in the bytes object, and write it nicely into a tab separated csv file. The following code would do that:

import csv, re

orig = b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

# Split the long string into a list of lines
data = orig.decode('utf-8').splitlines()

# Open the file for writing
with open("tmp.csv", "w") as csv_file:
    # Create the writer object with tab delimiter
    writer = csv.writer(csv_file, delimiter = '\t')
    for line in data:
        # Writerow() needs a list of data to be written, so split at all empty spaces in the line 
        writer.writerow(re.split('\s+',line))


来源:https://stackoverflow.com/questions/51089194/python-convert-bytes-unicode-tab-delimited-data-to-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!