awk

Which AWK program can do this manipulation?

一笑奈何 提交于 2021-02-08 12:12:41
问题 Given a file containing a structure arranged like the following (with fields separated by SP or HT) 4 5 6 2 9 8 4 8 m d 6 7 9 5 4 g t 7 4 2 4 2 5 3 h 5 6 2 5 s 3 4 r 5 7 1 2 2 4 1 4 1 9 0 5 6 d f x c a 2 3 4 5 9 0 0 3 2 1 4 q w Which AWK program do I need to get the following output? 4 5 m d t 7 h 5 r 5 4 1 x c 0 0 6 2 6 7 4 2 6 2 7 1 9 0 a 2 3 2 9 8 9 5 4 2 5 s 2 2 5 6 3 4 1 4 4 8 4 g 5 3 3 4 4 1 d f 5 9 q w Thanks in advance for any and all help. Postscript Please bear in mind, My input

how to conditionally filter rows in awk

夙愿已清 提交于 2021-02-08 12:01:25
问题 I am new to awk in linux. I have a large text file with 17 Million rows. The first column is subject ID and the second column is Age . Each subject may have multiple ages and I just want to filter the minimum age for each subject and print them in a separate text file. I am not sure if the subjects are ranked in first column from low to high... these are the first few rows: ID Age 16214497 36.000 16214497 63.000 16214727 63.000 16214781 71.000 16214781 79.000 16214792 67.000 16214860 79.000

How to close file in awk while generating a list of?

笑着哭i 提交于 2021-02-08 10:18:37
问题 Guys I'm trying to find a way to don't have the awk error "too many open file" . Here's my situation: INPUT : ASCII file, lot of line, with this scheme: NODE_212_lenght.._1 NODE_212_lenght.._2 NODE_213_lenght.._1 NODE_213_lenght.._2 In order to split this file with every record with the same NODE number, I've used this one-liner awk command awk -F "_" '{print >("orfs_for_node_" $2 "")}' <file With a file composed by lots of lines, this command keeps sayin "too many open files" . I've tried

awk: add new column, including header

拥有回忆 提交于 2021-02-08 09:45:07
问题 I have a file that looks like this: name measurement gender duration a 1 m 55 b 1 f 54 c 2 m 53 ... etc I want to use awk to add a column, which has the same value for every row, except the first (the header). Let's say I want to add the column new_column with the value 99 for every row, so the output file looks like this: name measurement gender duration new_column a 1 m 55 99 b 1 f 54 99 c 2 m 53 99 ... etc This sounds like a job for awk... but I haven't been able to figure out how. Any

In awk, why are “” and “\n\n” treated the same for the RS parameter?

99封情书 提交于 2021-02-08 09:32:18
问题 Here are the contents of the file: Person Name 123 High Street (222) 466-1234 Another person 487 High Street (523) 643-8754 And these two things give the same result: $ awk 'BEGIN{FS="\n"; RS="\n\n"} {print $1, $3}' file_contents $ awk 'BEGIN{FS="\n"; RS=""} {print $1, $3}' file_contents The result given in both cases is: Person Name (222) 466-1234 Another person (523) 643-8754 RS="\n\n" actually makes sense, but why is RS="" also treated the same way? 回答1: They aren't treated the same. RS=""

awk: Split on “\n”

柔情痞子 提交于 2021-02-08 08:32:30
问题 I'm trying to process a log file in which entries are compressed into one line with the newline encoded as "\n". I want to keep everything up to the first "\n" and discard the rest. awk -F"\n" '{print $1}' file doesn't work, and neither does awk -F"\\n" '{print $1}' file . What's the correct form of this command? 回答1: $ echo 'a\nb' a\nb $ echo 'a\nb' | awk -F'\\\\n' '{print $1}' a Here's why: Consider these uses of the above characters in regexp comparisons: n = the literal character n ( $0 ~

sed: remove whole words containg a character class

蓝咒 提交于 2021-02-08 08:20:18
问题 I'd like to remove any word which contains a non alpha char from a text file. e.g "ok 0bad ba1d bad3 4bad4 5bad5bad5" should become "ok" I've tried using echo "ok 0bad ba1d bad3 4bad4 5bad5bad5" | sed 's/\b[a-zA-Z]*[^a-zA-Z]\+[a-zA-Z]*\b/ /g' 回答1: Using awk : s="ok 0bad ba1d bad3 4bad4 5bad5bad5" awk '{ofs=""; for (i=1; i<=NF; i++) if ($i ~ /^[[:alpha:]]+$/) {printf "%s%s", ofs, $i; ofs=OFS} print ""}' <<< "$s" ok This awk command loops through all words and if word matches the regex /^[[

Create bins with awk histogram-like

浪子不回头ぞ 提交于 2021-02-08 07:51:17
问题 Here's my input file : 1.37987 1.21448 0.624999 1.28966 1.77084 1.088 1.41667 I would like to create bins of a size of my choice to get histogram-like output, e.g. something like this for 0.1 bins, starting from 0 : 0 0.1 0 ... 0.5 0.6 0 0.6 0.7 1 ... 1.0 1.1 1 1.1 1.2 0 1.2 1.3 2 1.3 1.4 1 ... My file is too big for R, so I'm looking for an awk solution (also open to anything else that I can understand, as I'm still a Linux beginner). This was sort of already answered in this post : awk

AWK set multiple delimiters for comma and quotes with commas

こ雲淡風輕ζ 提交于 2021-02-08 06:43:33
问题 I have a CSV file where columns are comma separated and columns with textual data that have commas are quoted. Sometimes, within quoted text there also exist quotes to mean things like inches resulting in more quotes. Textual data without embedded commas do not have quotes. For example: A,B,C 1,"hello, how are you",hello 2,car,bike 3,13.3 inch tv,"tv 13.3""" How do i use awk to print the number of columns for each row of which i should get 3 3 3 I thought of using $awk -F'[,"]' but im getting

Replace blank fields with zeros in AWK

跟風遠走 提交于 2021-02-08 05:59:00
问题 I wish to replace blank fields with zeros using awk but when I use sed 's/ /0/' file, I seem to replace all white spaces when I only wish to consider missing data. Using awk '{print NF}' file returns different field numbers (i.e. 9,4) due to some empty fields input 590073920 20120523 0 M $480746499 CM C 500081532 SP 501298333 0 M *BB 501666604 0 M *OO 90007162 7 M +178852 90007568 3 M +189182 output 590073920 20120523 0 M $480746499 CM C 500081532 SP 501298333 0 0 M *BB 0 0 0 0 501666604 0 0