awk two regex conditions - structure convoluted complex transactions list csv

巧了我就是萌 提交于 2019-12-31 04:19:04

问题


My original input files is a booking transaction list. I am interested in the lines that are in the two sections: a) transactions and b) refunds. These are always at the bottom of the CSVs and structured.

I can skip all lines above section transaction via regex condition /transaction/ {print}.

I would like to add a column with strings "transaction or refunds" depending on the section in the csv. So I know if a cloumn is a transactions or refund. something like

IF ($2 = "transaction" || " "  != "refunds"){$7=="transaction"};
IF ($2 = "refunds" || " "  != "transaction"){$7=="refunds"}

I share the CSV and script.awk on my gdrive and hope this is acceptable: convoluted transaction list to be structured

transaction date        via        Details     payment    fee         
             28-02-2015 invoice    txn1        44.1       0.19       
             28-02-2015 invoice    txn2        27.7       0.19       
             07-03-2015 invoice    txn3        43.1       0.19       
             09-03-2015 invoice    txn4        36.8       0.19       
             12-03-2015 invoice    txn5        26         0.19       
             13-03-2015 invoice    txn6        43.7       0.19       
             13-03-2015 invoice    txn7        25.6       0.19       
             15-03-2015 creditcard txn8        70.8       0.19       
                                  Sum         317.8       1.52       
refunds    Datum        via        Details     payment    1.52         
             18-12-2014 invoice    txn0          16           
                                  Sum            16

My intended outcome is this:

 date        via        Details        payment    fee     type 
 28-02-2015 invoice    txn1            44.1       0.19     transaction
 28-02-2015 invoice    txn2            27.7       0.19     transaction       
 07-03-2015 invoice    txn3            43.1       0.19     transaction       
 09-03-2015 invoice    txn4            36.8       0.19     transaction       
 12-03-2015 invoice    txn5            26       0.19       transaction       
 13-03-2015 invoice    txn6            43.7       0.19     transaction       
 13-03-2015 invoice    txn7            25.6       0.19     transaction       
 15-03-2015 creditcard txn8            70.8       0.19     transaction       

 18-12-2014 invoice    txn0            16                  refund         

My snippet at the moment:

BEGIN {OFS=FS=";"
print {date,payment option,detailspayment,fee,type }

/^transactions/,/^$/{
if ($3=="via) {next};
if ($6=="Sum") {next}; 
print $2 FS $3 FS $4 FS $5 FS $6 FS $7; 
}

回答1:


awk '
  NR == 1 {
    $1 = ""
    print $0, "type"
    type = "transaction"
    next
  }
  $1 == "refunds" {
    print ""
    type = "- refund"
  }
  /^ / && NF > 3 {
    print $0, type
  }' input.txt |column -t

Outputs:

date        via         Details  payment  fee   type
28-02-2015  invoice     txn1     44.1     0.19  transaction
28-02-2015  invoice     txn2     27.7     0.19  transaction
07-03-2015  invoice     txn3     43.1     0.19  transaction
09-03-2015  invoice     txn4     36.8     0.19  transaction
12-03-2015  invoice     txn5     26       0.19  transaction
13-03-2015  invoice     txn6     43.7     0.19  transaction
13-03-2015  invoice     txn7     25.6     0.19  transaction
15-03-2015  creditcard  txn8     70.8     0.19  transaction
18-12-2014  invoice     txn0     16       -     refund

I'm running this through column -t in order to line up the columns, though that removes the added line break before the refund. Another difference is the dash used for the refund's "fee" which is necessary in order for column -t to work correctly.

In the awk code, if the number of records (line number, NR) is 1, remove the first item and print the rest plus "type" and then we move on to the next line. If that line starts with "refunds" then we print a blank line and then alter the type to "refund" (since there's no fee, we indicate that with a dash). Finally, if we have leading spaces and the number of fields (NF) is 4+, we print the line plus the type.

The awk code can be all on one line if you use semicolons between commands inside the actions.



来源:https://stackoverflow.com/questions/42977022/awk-two-regex-conditions-structure-convoluted-complex-transactions-list-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!