How to compare and substitute strings in different lines in unix

前端未结

关注

 3  1830

I want to compare and substitute strings present in different lines in unix

For example I have a file with two words in each line


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-12-11 00:04
              
            
            
                                                                       
This is VERY clearly a case for a recursive descent solution:

$ cat tst.awk
function descend(node) {return (map[node] in map ? descend(map[node]) : map[node])}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }

$ awk -f tst.awk file
<a> <e>
<b> <e>
<c> <e>
<d> <e>


If infinite recursion in your input is a possibility, here;s an approach that will print as the 2nd field the last node before the recursion starts and put a "*" next to it so you know it's happening:

$ cat tst.awk
function descend(node,  child, descendant) {
    stack[node]
    child = map[node]
    if (child in map) {
        if (child in stack) {
            descendant = node "*"
        }
        else {
            descendant = descend(child)
        }
    }
    else {
        descendant = child
    }
    delete stack[node]
    return descendant
}
{ map[$1] = $2 }
END { for (key in map) print key, descend(key) }


.

$ cat file
<w> <w>
<x> <y>
<y> <z>
<z> <x>
<a> <b>
<d> <e>
<b> <c>
<c> <e>

$ awk -f tst.awk file
<w> <w>*
<x> <z>*
<y> <x>*
<z> <y>*
<a> <e>
<b> <e>
<c> <e>
<d> <e>


If you need the output order to match the input order and/or or to print duplicate lines twice, change the bottom 2 lines of the script to:

{ keys[++numKeys] = $1; map[$1] = $2 }
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        print key, descend(key)
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-11 00:08
              
            
            
                                                                       
Perl to the rescue:

#!/usr/bin/perl
use warnings;
use strict;

my (@buff);
sub output {
    my $last = pop @buff;
    print map "$_ $last\n", @buff;
    @buff = ();
}

while (<>) {
    my @F = split;
    output() if @buff and $F[0] ne $buff[-1]; # End of a group.
    push @buff, $F[0] unless @buff;           # Start a new group.
    push @buff, $F[1];
}

output();                                     # Don't forget to print the last buffer.


Explanation: Read the input line by line. Keep a list of words to be printed with the same second word. If the first word is different than the second word of the previous line, print the buffered output.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-12-11 00:20
              
            
            
                                                                       
awk '{i++;a[i]=$1;b[i]=$2;next}
      END{
            for(i=1;i in a;i++)
            {
              f=1;
              while (f==1)
              {
                f=0;
                for(j=i+1;j in a;j++)
                {
                  if(b[i]==a[j])
                  {
                    b[i]=b[j];
                    f=1;
                  }
                }
              }
            }
            for(i=1;i in a;i++)
            {
              print a[i],b[i];
            }
          }' input.txt


Input:

<a> <b>
<d> <e>
<b> <c>
<c> <e>


Output:

<a> <e>
<d> <e>
<b> <e>
<c> <e>


Input:

<a> <b>
<e> <z>
<b> <e>


Output:

<a> <z>
<e> <z>
<b> <e>



EDIT

If you need to get

<a> <z>
<e> <z>
<b> <z>


As output from the second input you can change this line:

if(b[i]==a[j])


to:

if(j!=i&&b[i]==a[j])


and this:

for(j=i+1;j in a;j++)


to:

for(j=1;j in a;j++)


Also note that this code assumes there is not a case where second word of a line is equal to both first word of a line and its second word i.e:

<a> <b>
<e> <z>
<b> <b>


In that case the execution of the code will never ends.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复