awk set elements in array

对着背影说爱祢 提交于 2019-12-23 02:52:18

问题


I have a large .csv file to to process and my elements are arranged randomly like this:

xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012 xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012 xxxxxx,xx,MLOCAL,341993,22/10/2012
xxxxxx,xx,MREMOTE,9356828,08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012

where fields LOCAL, REMOTE, MLOCAL or MREMOTE are displayed like:

  1. when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is MLOCAL, and 4th field is MREMOTE, then 5th and 7th field represent the value and date of MLOCAL, and 6th and 8th represent the value and date of MREMOTE
  2. when they are displayed as single (only LOCAL or only REMOTE) then the 4th and 5th fields represent the value and date of field 3.

Now, I have split these rows using:

nawk 'BEGIN{

while (getline < "'"$filedata"'")
split($0,ft,",");
name=ft[1];
ID=ft[2]
 ?=ft[3]
 ?=ft[4]
....................

but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.

Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:

if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" )
{
        MLOCAL=ft[3];
        MLOCAL_qty=ft[4];
        MLOCAL_TIMESTAMP=ft[5];
}
else if ( ft[3] == MLOCAL && ft[4] == MREMOTE )
{
        MLOCAL=ft[3];
        MREMOTE=ft[4];
        MOCAL_qty=ft[5];
        MREMOTE_qty=ft[6];
        MOCAL_TIMESTAMP=ft[7];
        MREMOTE_TIMESTAMP=ft[8];
}
else if ( ft[3] == MREMOTE && ft[4] != MOCAL )
{
        MREMOTE=ft[3];
        MREMOTE_qty=ft[4];
        MREMOTE_TIMESTAMP=ft[5];
..........................................

but it's not working as well.

So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.

EDIT

I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused. My output should be like following:

NAME,UNIQUE_ID,VOLUME_ALOCATED,MLOCAL_VALUE,MLOCAL_TIMESTMP,MLOCAL_limit,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,MREMOTE_VALUE,MREMOTE_TIMESTAMP,REMOTE_VALUE,REMOTE_TIMESTAMP

(where MLOCAL_limit and LOCAL_limit are a subtract result between VOLUME_ALOCATED and MLOCAL_VALUE or LOCAL_VALUE)

So, in my output file, fields position should be arranged like: 4th field =MLOCAL_VALUE,5th field =MLOCAL_TIMESTMP,7th field=LOCAL_VALUE, 8th field=LOCAL_TIMESTAMP,10th field=MREMOTE_VALUE,11th field=MREMOTE_TIMESTAMP,12th field=REMOTE_VALUE,13th field=REMOTE_TIMESTAMP

Now, an example would be this: for the following input: name,ID,VOLUME_ALLOCATED,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012

name,ID,VOLUME_ALLOCATED,REMOTE,234455,19/12/2012

I should process this line and the output should be this:

name,ID,VOLUME_ALLOCATED,33222,22/10/2012,MLOCAL_LIMIT, ,,,56,18/10/2012,,

7th ,8th, 9th,12th, and 13th fields are empty because there is no info related to: LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,REMOTE_VALUE, and REMOTE_TIMESTAMP

OR

name,ID,VOLUME_ALLOCATED,,,,,,,,,234455,9/12/2012

4th,5th,6th,7th,8th,9th,10thand ,11th, fields should be empty values because there is no info about: MLOCAL_VALUE,MLOCAL_TIMESTAMP,MLOCAL_LIMIT,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_LIMIT,MREMOTE_VALUE,MREMOTE_TIMESTAMP

VOLUME_ALLOCATED is retrieved from other csv file (called "info.csv") based on the ID field which is processed earlier in the script like:

info.csv

VOLUME_ALLOCATED,ID,CLIENT 5242881,64,subscriber 567743,24,visitor

data.csv

NAME,64,MLOCAL,341993,23/10/2012 NAME,24,LOCAL$REMOTE,2347$4324,19/12/2012$18/12/2012

Now, my code is this:

    #! /usr/bin/bash

input="info.csv"
filedata="data.csv"
outfile="out"

nawk 'BEGIN{
while (getline < "'"$input"'")
{
split($0,ft,",");
volume=ft[1];
id=ft[2];
client=ft[3];

key=id;
volumeArr[key]=volume;
clientArr[key]=client;
}
close("'"$input"'");

while (getline < "'"$filedata"'")
{
gsub(/\$/,","); # substitute the $ separator with comma
split($0,ft,",");
volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key
segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key
NAME=ft[1];
id=ft[2];

here I'm stuck, I can't find the right way to set the rest of the fields since I don't know how to handle the 3rd and 4th fields.

? =ft[3];
? =ft[4];

Sorry, if I make you pretty confused but this is my current situation right now. Thanks


回答1:


You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:

$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
   delete value       # or use split("",value) if your awk cant delete arrays
   if ($4 ~ /LOCAL|REMOTE/) {
      value[$3] = $5
      date[$3]  = $7
      value[$4] = $6
      date[$4]  = $8
   }
   else {
      value[$3] = $4
      date[$3]  = $5
   }

   print
   for (type in value) {
      printf "%15s%15s%15s\n", type, value[type], date[type]
   }
}
$ awk -f tst.awk file
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
        MREMOTE             56     18/10/2012
         MLOCAL          33222     22/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
        MREMOTE          33222     22/10/2012
         MLOCAL             56     18/10/2012
xxxxxx,xx,MLOCAL,*341993,22/10/2012*
         MLOCAL        *341993    22/10/2012*
xxxxxx,xx,MREMOTE,9356828,08/10/2012
        MREMOTE        9356828     08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
         REMOTE          15253     22/10/2012
          LOCAL          19316     22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
         REMOTE        1865871     22/10/2012
          LOCAL         383666     22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
         REMOTE     1180306134     19/10/2012

and if you post the expected output we could help you more.



来源:https://stackoverflow.com/questions/14391738/awk-set-elements-in-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!