how to trim file - remove the columns with the same value

前端未结

关注

 8  2050

天涯浪人 2021-01-05 09:15

I would like your help on trimming a file by removing the columns with the same value.

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
joh


      
      
        
          8条回答        

        
                    
            
            
                         
                
              
              
                
                   盖世英雄少女心
                                             
                
                
                (楼主)
            
              
              
                2021-01-05 09:52
              

            
            
                        
You can select the column to cut out like

# using bash/awk
# I had used 1000000 here, as you had written millions of columns but you should adjust it
for cols in `seq 2 1000000` ; do
    cut -d DELIMITER -f $cols FILE | awk -v c=$cols '{s+=$0} END {if (s/NR==$0) {printf("%i,",c)}}'
done | sed 's/,$//' > tmplist
cut --complement -d DELIMITER -f `cat tmplist` FILE


But it can be REALLY slow, because it's not optimized, and reads the file several times... so be careful with huge files.  

Or you can read the whole file once with awk and select the dumpable columns, then use cut.

cut --complement -d DELIMITER -f `awk '{for (i=1;i<=NF;i++) {sums[i]+=$i}} END {for (i=1;i<=NF; i++) {if (sums[i]/NR==$i) {printf("%i,",c)}}}' FILE | sed 's/,$//'` FILE


HTH
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复