AWK to Consolidate Files

て烟熏妆下的殇ゞ 提交于 2020-01-05 13:07:51

问题


I'm hacking some AWK. I'm a beginner with it. I have done my homework on the following problem, and just can't get it to work.

RAW DATA SAMPLE:

Start Date  12/3/17
End Date    12/30/17
Report Type Report1
Currency    ZAR
Country Identifier  MType   Quantity    Net Net Net Code    Title   Contrib I_Type  M_Type  Vendor Identifier   Offline Indicator   LSN
ZA  44057330    FMP 1   0.050666    0.050666    USYYYYYYYYYY    ABC Tom 1   1   USYYYYYYYYYY    0   SUT
ZA  1267456726  SIMT    1   0.03    0.03    USXXXXXXXXXX    DEF Frances 1   1   USXXXXXXXXXX    0   XYZ
Row Count   657
Storefront Name MType   Quantity    Net Net
ZA  FMP 601 30.45
ZA  IAP 13  0.68
ZA  IMP 1035    69.36
ZA  SIMP    54  1.4
ZA  FMT 70  0.53
ZA  IMT 92  1.68
ZA  SIMT    6   0.18

DESIRED OUTPUT:

(I left the special characters un-escaped here.)

"Filename"  "Start Date"    "End Date"  "Currency"  "Country"   "Identifier"    "MType" "Quantity"  "Net"   "NetNet"    "Code"  "Title" "Contrib"   "I_Type"    "M_Type"    "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "44057330"  "FMP"   "1" "0.050666"  "0.050666"  "USYYYYYYYYYY"  "ABC"   "Tom"   "1" "1" "USYYYYYYYYYY"  "0" "SUT"
"rawfile.txt"   "12/3/17"   "12/30/17"  "ZAR"   "ZA"    "1267456726"    "SIMT"  "1" "0.03"  "0.03"  "USXXXXXXXXXX"  "DEF"   "Frances"   "1" "1" "USXXXXXXXXXX"  "0" "XYZ"

Basically I just need to get most of the header from line 5, but three fields I need are in lines 1-4. Also, I don't need the data including and after the line that starts with "Row Count".

MY BEST "GUESS" SO FAR:

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
  /^Row Count/ {nextfile}
  FNR == 1 { StartDate=$2; }
  FNR == 2 { EndDate=$2; }
  FNR == 4 { curr=$2; }
  NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print} 
  FNR > 5 {$0 =  StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt

Thanks!

EDITS:

NEW TABLE

These are the lines before the field headers in every file. Content begins on line 4:

Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,

"MY" SCRIPT

It almost works. But it includes lines 1-3 for every file: gawk ' function basename(file) { sub(".*/", "", file) return file } BEGIN { FS=OFS="," } NR < 3 { if ( NR == 2 ) { hdr = "Report_Period" OFS val = val $1 OFS } next } FNR>3 { print "Filename", hdr $0 next } { print basename(FILENAME), val $0 } ' OFS="," /path/to/input/files > ~/path/to/output/file/file.csv

End of edit


回答1:


Your sample input format isn't clear but this might be what you're looking for or it might be doing more than necessary or something else entirely:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
    fname = FILENAME
    sub(/.*[/]/,"",fname)
}
{
    gsub(/[\\]t/,FS)
    gsub(/[\\][/]/,"/")
    gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
    if ( FNR != 3 ) {
        hdr = hdr $1 OFS
        val = val $2 OFS
    }
    next
}
FNR==5 {
    print "\"Filename\"", hdr $0
    next
}
{ print "\""fname"\"", val $0 }

$ awk -f tst.awk file
"Filename"      "Start Date"    "End Date"      "Currency"      "Country"       "Identifier"    "MType" "Quantity"   "Net"    "Net Net"       "Code"  "Title" "Contrib"       "I_Type"        "M_Type"        "Vendor Identifier"     "Offline Indicator"   "LSN"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "44057330"      "FMP"   "1"     "0.050666"      "0.050666"   "USYYYYYYYYYY"   "ABC"   "Tom"   "1"     "1"     "USYYYYYYYYYY"  "0"     "SUT"
"file"  "12/3/17"       "12/30/17"      "ZAR"   "ZA"    "1267456726"    "SIMT"  "1"     "0.03"  "0.03"  "USXXXXXXXXXX""DEF"   "Frances"       "1"     "1"     "USXXXXXXXXXX"  "0"     "XYZ"

The above uses GNU awk for nextfile, which you were already using.



来源:https://stackoverflow.com/questions/51144323/awk-to-consolidate-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!