deleting selected lines from data file

做~自己de王妃 提交于 2020-01-06 10:53:55

问题


This question is continuation from my earlier post titled "selecting digits from regular expression".

Below is the sample data as posted in the earlier post.

          DONOR         ACCEPTORH      ACCEPTOR           
    atom#  res@atom   atom#  res@atom atom#  res@atom %occupied  distance       angle        
  |  4726   59@O12 |  1487    19@H12  1486    19@O12 |  85.66  2.819 ( 0.18)  21.85 (12.11)        
  |  1499   19@O15 |  1730    22@H12  1729    22@O12 |  83.15  3.190 ( 0.31)  22.36 (12.73)        
  |  1216   16@O22 |  1460    19@H22  1459    19@O22 |  75.74  2.757 ( 0.14)  24.55 (13.66)        
  |  4232   53@O25 |  4143    52@H24  4142    52@O24 |  74.35  2.916 ( 0.25)  28.27 (13.26)        
  |  3683   46@O16 |  4163    52@H13  4162    52@O13 |  73.78  2.963 ( 0.29)  23.65 (14.14)        
  |  4162   52@O13 |  4079    51@H12  4078    51@O12 |  73.68  2.841 ( 0.19)  21.25 (11.87)        
  |  3764   47@O16 |  3825    48@H26  3824    48@O26 |  70.52  2.973 ( 0.28)  26.88 (13.14)        
  .
  .
  The lines goes few thousands.

I tired Fredirk's code and it works fine for selecting the lines. Well, now I would like to extend this idea to my real problem.

The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:

   1    2   3   4   5   6   7       8

   9    10  11  12  13  14  15      16
  17    18  19  20  21  22  23      24
  25    26  27  28  29  30  31      32
  33    34  35  36  37  38  39      40
  41    42  43  44  45  46  47      48
  49    50  51  52  53  54  55      56

  57    58  59  60  61  62  63      64 

Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.

What I want is to select the pairs made only by numbers which arranged at the outer most lines of the above ordering.

 In short, ANY PAIRS made by only the numbers  (1 2 3 4 5 6 7 8   57 58 59 60 61 62 63 64   1 9 17 25 33 41 49 57   8 16 24 32 40 48 56 64) are need to be deleted.

I have no idea how to write loop in awk code to select those pairs and delete the lines straight away.

I wish to say many thanks in advance.


回答1:


Use an array to hold the set of numbers. Define it in the BEGIN block

BEGIN {
  i=0
  for (n=1; n<=8; n++) set[i++] = n
  for (n=57; n<=64; n++) set[i++] = n
  for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7}
}

Then, check that $3 and $6 are both in (or not in) the set:

($3 in set) && ($6 in set) {next}


来源:https://stackoverflow.com/questions/7343411/deleting-selected-lines-from-data-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!