Delete columns from space delimited file where file header matches

后端未结

关注

 5  1574

爱一瞬间的悲伤 2021-01-25 13:58

I have a space delimited input text file. I would like to delete columns where the column header is size using sed or awk.

Input File:

id quantity colour


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   清歌不尽
                                             
                
                
                (楼主)
            
              
              
                2021-01-25 14:04
              

            
            
                        
A general solution using awk. There is a hard-coded variable (columns_to_delete) in the BEGIN block to indicate positions of fields to delete. The script then will calculate the width of each field and will delete those that match the position of the variable.

Assuming infile has the content of the question and following content of script.awk:

BEGIN {
    ## Hard-coded positions of fields to delete. Separate them with spaces.
    columns_to_delete = "5 8 11"

    ## Save positions in an array to handle it better.
    split( columns_to_delete, arr_columns )
}


## Process header.
FNR == 1 { 

    ## Split header with a space followed by any non-space character.
    split( $0, h, /([[:space:]])([^[:space:]])/, seps )

    ## Use FIELDWIDTHS to handle fixed format of data. Set that variable with
    ## length of each field, taking into account spaces.
    for ( i = 1; i <= length( h ); i++ ) { 
        len = length( h[i] seps[i] )
        FIELDWIDTHS = FIELDWIDTHS " " (i == 1 ? --len : i == length( h ) ? ++len : len)
    }   

    ## Re-calculate fields with new FIELDWIDTHS variable.
    $0 = $0
}

## Process header too, and every line with data.
{
    ## Flag to know if 'p'rint to output a field.
    p = 1 

    ## Go throught all fields, if found in the array of columns to delete, reset
    ## the 'print' flag.
    for ( i = 1; i <= NF; i++ ) { 
        for ( j = 1; j <= length( arr_columns ); j++ ) { 
            if ( i == arr_columns[j] ) { 
                p = 0 
                break
            }   
        }   

        ## Check 'print' flag and print if set.
        if ( p ) { 
            printf "%s", $i
        }
        else {
            printf " " 
        }
        p = 1 
    }   
    printf "\n"
}


Run it like:

awk -f script.awk infile


With following output:

id  quantity colour shape    colour shape      colour  shape    
1   10       blue   square   red    triangle   pink    circle   
2   12       yellow pentagon orange rectangle  purple   oval




EDIT: Oh oh, just now realised that output is not right, because of a join between two fields. Fix that would be too much work because there will be to check the max column size for every line before starting to process anything. But with this script I hope you get the idea. Not time now, perhaps I can try to fix it later on, but not sure.

EDIT 2: Fixed adding an additional space for each field deleted. It was easier than expected :-)



EDIT 3: See comments.

I've modified the BEGIN block to check that an extra variable is provided as argument.

BEGIN {
    ## Check if a variable 'delete_col' has been provided as argument.
    if ( ! delete_col ) { 
        printf "%s\n", "Usage: awk -v delete_col=\"column_name\" -f script.awk " ARGV[1]
        exit 0
    }   

}


And added to FNR == 1 pattern the process of calculating the numbers of the columns to delete:

## Process header.
FNR == 1 { 

    ## Find column position to delete given the name provided as argument.
    for ( i = 1; i <= NF; i++ ) { 
        if ( $i == delete_col ) { 
            columns_to_delete = columns_to_delete " " i
        }   
    }   

    ## Save positions in an array to handle it better.
    split( columns_to_delete, arr_columns )

    ## ...
    ## No modifications from here until the end. Same code as in the original script.
    ## ...
}


Now run it like:

awk -v delete_col="size" -f script.awk infile


And result will be the same.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复