awk | 易学教程

Parse the large test files using awk

阅读更多关于 Parse the large test files using awk

问题 I am looking to parse a space delimited input text file using awk. The column code can have more than one row for each group. I would greatly appreciate any help with this. Input File: TR 1 Action Success/Failure 8.1.1.1 RunOne 80 48 8.1.1.2 RunTwo 80 49 8.1.1.3 RunThree 100 100 8.1.1.4 RunFour 20 19 8.1.1.5 RunFive 20 20 Action Time 16:47:42 Action2 Success/Failure 8.1.2.1 RunSix 80 49 8.1.2.2 RunSeven 80 80 8.1.2.3 RunEight 80 80 Action2 Time 03:26:31 TR 2 Action Success/Failure 8.1.1.1

Parse the large test files using awk

阅读更多关于 Parse the large test files using awk

Parsing GenBank file

阅读更多关于 Parsing GenBank file

问题 Basically, a GenBank file consists on gene entries (announced by 'gene' followed by its corresponding 'CDS' entry (only one per gene) like the two I show here below. I would like to get locus_tag vs product in a tab-delimited two column file. 'gene' and 'CDS' are always preceded and followed by spaces. If this task can be easily performed using an already available tool, please let me know. Input file: gene complement(8972..9094) /locus_tag="HAPS_0004" /db_xref="GeneID:7278619" CDS complement

Extract data between two tags

阅读更多关于 Extract data between two tags

问题 After searching and reading extensively, I managed to get half of the work done. Here is the string: <td class='bold vmiddle'> Owner CIDR: </td><td><span class='jtruncate-text'><a href="http://3.abcdef.com/ip-3/encoded/czovL215aXAubXMvdmlldy9pcF9hZGRyZXNzZXMvNDIuMjI0LjAuMA%3D%3D">42.224.0.0</a>/12</span></td> I need to extract the 42.224.0.0 and /12 to make a 42.224.0.0/12 . Now I managed to get 42.224.0.0 by using: sed -n 's/^.*<a.href="[^"]*">$[^<]*$.*/\1/p' but I'm at a loss how to

Extract data between two tags

阅读更多关于 Extract data between two tags

How to sort out duplicates from a massive list using sort, uniq or awk?

阅读更多关于 How to sort out duplicates from a massive list using sort, uniq or awk?

问题 I have a 12Gb file of combined hash lists. I need to find the duplicates in it but I've been having some issues. Some 920 (uniq'd) lists were merged using cat *.txt > _uniq_combined.txt resulting in a huge list of hashes. Once merged, the final list WILL contain duplicates. I thought I had it figured out with awk '!seen[$0]++' _uniq_combined.txt > _AWK_duplicates.txt && say finished ya jabroni awk '!seen[$0]++' _uniq_combined.txt > _AWK_duplicates.txt results in a file with a size of

How to sort out duplicates from a massive list using sort, uniq or awk?

阅读更多关于 How to sort out duplicates from a massive list using sort, uniq or awk?

How to sort out duplicates from a massive list using sort, uniq or awk?

阅读更多关于 How to sort out duplicates from a massive list using sort, uniq or awk?

Is it possible to have different behavior for first and second input files to awk?

阅读更多关于 Is it possible to have different behavior for first and second input files to awk?

问题 For example, suppose I run the following command: gawk -f AppendMapping.awk Reference.tsv TrueInput.tsv Assume the names of files WILL change. While iterating through the first file, I want to create a mapping. map[$16]=$18 While iterating through the second file, I want to use the mapping. print $1, map[$2] What's the best way to achieve this behavior (ie, different behavior for each input file)? 回答1: As you probably know NR stores the current line number; as you may or may not know, it's

Combining two very large files ignoring the first sentence

阅读更多关于 Combining two very large files ignoring the first sentence

问题 I want to combine two giant file each few hundred megabyte into a single file while ignoring the first line. I wanted to use awk as I thought it should be the most optimized way. the way I'm doing it only ignores the first line of second file. Any idea how to do make work or if there's a faster way to do it? awk 'FNR!=NR && FNR==1 {next} 1' 'FNR!=NR && FNR==1 {next} 2' s_mep_{1,2}.out >> s_mep.out 回答1: $ awk 'FNR>1' file{1,2} > file_12 回答2: With sed (sed '1d' file_1 ; sed '1d' file_2) > new