awk | 易学教程

keep fasta records which have 2 matches of OX values

阅读更多关于 keep fasta records which have 2 matches of OX values

问题 I have a file that looks as follows : >sp|rin-1 ghsfdhjkuesl OX=10116 GN=Cdh1 PE=1 SV=1|sp|P10287|ghsfdjdeosd gdhkhs OX=10090 GN=Cdh3 PE=1 SV=2 WRDTANWLEINPETGVISTRAEMDREDSEHVKNSTYTALIIATDDGSPIATGTGTLLLVLSDVNDNAPIPEPRNMQFCQRNPKPHVITILDPDLPP >sp|erin-1 ghsfdshkd OX=10116 GN=Cdh1 PE=1 SV=1|sp|P22223|CADH3_HUMAN Cadherin-3 OX=9606 GN=CDH3 PE=1 SV=2 ESYPTYTLVVQAADLQGEGLSTTAKAVITVKDINDNAPIFNPSTYLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGKVFMGCPGQEPALFSTD >sp|n-1 ghsfd OX=10116 GN=Cdh1 PE=1 SV=1|tr|F1LMI3

Split csv file thousands of times based on groupby

阅读更多关于 Split csv file thousands of times based on groupby

问题 (An adaptation of David Erickson's question here) Given a CSV file with columns A, B, and C and some values: echo 'a,b,c' > file.csv head -c 10000000 /dev/urandom | od -d | awk 'BEGIN{OFS = ","}{print $2, $3, $4}' | head -n 10000 >> file.csv We would like to sort by columns a and b: sort -t ',' -k1,1n -k2,2n file.csv > file_.csv head -n 3 file_.csv >a,b,c 3,50240,18792 7,54871,39438 And then for every unique pair (a, b) create a new CSV titled '{a}_Invoice_{b}.csv' . The main challenge seems

Replace the value of third column if the first two columns are same in two variables

阅读更多关于 Replace the value of third column if the first two columns are same in two variables

问题 I need to replace the value of the third column if the first two columns are same in two variables. I tried: to store the first and second column of the first variable using NR===FNR . Then if first and second columns are same then replace the column three of variable "b" with third column of variable "s". However, doing $3=$3 does not make any sense. awk 'NR==FNR{a[$1FS$2]=$1FS$2;next} $1FS$2 in a {$3=$3}1' <(echo "$s") <(echo "$b") NODE AREA-29 1 UP ENABLED PINGABLE ASIA ACTIVE NODE AREA-21

Negative floating point numbers are not sorted correctly using awk or sort

阅读更多关于 Negative floating point numbers are not sorted correctly using awk or sort

问题 For some reason, comparisons of negative floating point numbers with awk and sort seem to be broken on my machine. It seems that -0.1 < -0.2 . When I try to sort 0.2 -0.1 -0.2 0.1 0 using sort -n test.dat , I get -0.1 -0.2 0 0.1 0.2 instead of -0.2 -0.1 0 0.1 0.2 What is wrong with me? 回答1: Answer: You are French! In french, the decimal point is a comma ( , ) and not a dot ( . ). You need to either replace the dots with commas or change your locale. Try LC_NUMERIC=us_EN.UTF-8 sort -n test.dat

Negative floating point numbers are not sorted correctly using awk or sort

阅读更多关于 Negative floating point numbers are not sorted correctly using awk or sort

Compare two text files and print the difference against key in bash shell script

阅读更多关于 Compare two text files and print the difference against key in bash shell script

问题 Shell script, bash, have 2 large files around 1.2 GB data, with key and values, I need to compare both files based on the key and store difference in the value in the third file, File 2 will always be a subset of File 1, just need to find values( against key ) which are not present in file 2 and unique ones in File 1. File 1: test1 marco;polo;angus test2 mike;zen;liza test3 tom;harry;alan test4 bob;june;janet 1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule 343323H

Compare two text files and print the difference against key in bash shell script

阅读更多关于 Compare two text files and print the difference against key in bash shell script

merge two files based on common column values

阅读更多关于 merge two files based on common column values

问题 I have file1 likes: 1 A aa 2 A bb 3 A cc 4 A dd 5 B xx 6 C yy 7 C zz And a file2: 1 A 11 2 B 22 3 C 33 And I would like to merge file1 and file 2 into a file3 based on the 2nd column, such that: 1 A aa 11 2 A bb 11 3 A cc 11 4 A dd 11 5 B xx 22 6 C yy 33 7 C zz 33 Which way is the simplest? Thank you. 回答1: Using pandas will save you a lot of time if you use Python. So if your DataFrames are df1 : 1 2 0 1 A aa 2 A bb 3 A cc 4 A dd 5 B xx 6 C yy 7 C zz and df2 : 1 2 0 1 A 11 2 B 22 3 C 33 then

Formatting date strings in a file with linux bash shell

阅读更多关于 Formatting date strings in a file with linux bash shell

问题 When I cat the file an example of the output format is: ok: servername Mon May 23 00:00:00 EDT 2018 ok: servername Thu Jul 16 00:00:00 EDT 2019 I would like the format to be something like ok: servername 05/23/2018 ok: servername 07/16/2019 I need to use the Linux bash shell to do it. If any one could help me I be very grateful. 回答1: When performance matters. Put this in script.awk : BEGIN{ m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"; m["May"]="05"; m["Jun"]="06"; m["Jul"]="07"

Formatting date strings in a file with linux bash shell

阅读更多关于 Formatting date strings in a file with linux bash shell