awk

keep fasta records which have 2 matches of OX values

烂漫一生 提交于 2020-05-16 22:05:02
问题 I have a file that looks as follows : >sp|rin-1 ghsfdhjkuesl OX=10116 GN=Cdh1 PE=1 SV=1|sp|P10287|ghsfdjdeosd gdhkhs OX=10090 GN=Cdh3 PE=1 SV=2 WRDTANWLEINPETGVISTRAEMDREDSEHVKNSTYTALIIATDDGSPIATGTGTLLLVLSDVNDNAPIPEPRNMQFCQRNPKPHVITILDPDLPP >sp|erin-1 ghsfdshkd OX=10116 GN=Cdh1 PE=1 SV=1|sp|P22223|CADH3_HUMAN Cadherin-3 OX=9606 GN=CDH3 PE=1 SV=2 ESYPTYTLVVQAADLQGEGLSTTAKAVITVKDINDNAPIFNPSTYLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGKVFMGCPGQEPALFSTD >sp|n-1 ghsfd OX=10116 GN=Cdh1 PE=1 SV=1|tr|F1LMI3

Split csv file thousands of times based on groupby

风格不统一 提交于 2020-05-16 16:12:41
问题 (An adaptation of David Erickson's question here) Given a CSV file with columns A, B, and C and some values: echo 'a,b,c' > file.csv head -c 10000000 /dev/urandom | od -d | awk 'BEGIN{OFS = ","}{print $2, $3, $4}' | head -n 10000 >> file.csv We would like to sort by columns a and b: sort -t ',' -k1,1n -k2,2n file.csv > file_.csv head -n 3 file_.csv >a,b,c 3,50240,18792 7,54871,39438 And then for every unique pair (a, b) create a new CSV titled '{a}_Invoice_{b}.csv' . The main challenge seems

Replace the value of third column if the first two columns are same in two variables

≯℡__Kan透↙ 提交于 2020-05-15 18:38:25
问题 I need to replace the value of the third column if the first two columns are same in two variables. I tried: to store the first and second column of the first variable using NR===FNR . Then if first and second columns are same then replace the column three of variable "b" with third column of variable "s". However, doing $3=$3 does not make any sense. awk 'NR==FNR{a[$1FS$2]=$1FS$2;next} $1FS$2 in a {$3=$3}1' <(echo "$s") <(echo "$b") NODE AREA-29 1 UP ENABLED PINGABLE ASIA ACTIVE NODE AREA-21

Negative floating point numbers are not sorted correctly using awk or sort

南笙酒味 提交于 2020-05-11 12:54:26
问题 For some reason, comparisons of negative floating point numbers with awk and sort seem to be broken on my machine. It seems that -0.1 < -0.2 . When I try to sort 0.2 -0.1 -0.2 0.1 0 using sort -n test.dat , I get -0.1 -0.2 0 0.1 0.2 instead of -0.2 -0.1 0 0.1 0.2 What is wrong with me? 回答1: Answer: You are French! In french, the decimal point is a comma ( , ) and not a dot ( . ). You need to either replace the dots with commas or change your locale. Try LC_NUMERIC=us_EN.UTF-8 sort -n test.dat

Negative floating point numbers are not sorted correctly using awk or sort

谁都会走 提交于 2020-05-11 12:53:59
问题 For some reason, comparisons of negative floating point numbers with awk and sort seem to be broken on my machine. It seems that -0.1 < -0.2 . When I try to sort 0.2 -0.1 -0.2 0.1 0 using sort -n test.dat , I get -0.1 -0.2 0 0.1 0.2 instead of -0.2 -0.1 0 0.1 0.2 What is wrong with me? 回答1: Answer: You are French! In french, the decimal point is a comma ( , ) and not a dot ( . ). You need to either replace the dots with commas or change your locale. Try LC_NUMERIC=us_EN.UTF-8 sort -n test.dat

Compare two text files and print the difference against key in bash shell script

微笑、不失礼 提交于 2020-05-09 17:28:30
问题 Shell script, bash, have 2 large files around 1.2 GB data, with key and values, I need to compare both files based on the key and store difference in the value in the third file, File 2 will always be a subset of File 1, just need to find values( against key ) which are not present in file 2 and unique ones in File 1. File 1: test1 marco;polo;angus test2 mike;zen;liza test3 tom;harry;alan test4 bob;june;janet 1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule 343323H

Compare two text files and print the difference against key in bash shell script

巧了我就是萌 提交于 2020-05-09 17:28:25
问题 Shell script, bash, have 2 large files around 1.2 GB data, with key and values, I need to compare both files based on the key and store difference in the value in the third file, File 2 will always be a subset of File 1, just need to find values( against key ) which are not present in file 2 and unique ones in File 1. File 1: test1 marco;polo;angus test2 mike;zen;liza test3 tom;harry;alan test4 bob;june;janet 1332239_44557576_CONTI Lased & Micro kjd $353.50_30062020_lsdf3_no-rule 343323H

merge two files based on common column values

梦想的初衷 提交于 2020-05-09 11:38:31
问题 I have file1 likes: 1 A aa 2 A bb 3 A cc 4 A dd 5 B xx 6 C yy 7 C zz And a file2: 1 A 11 2 B 22 3 C 33 And I would like to merge file1 and file 2 into a file3 based on the 2nd column, such that: 1 A aa 11 2 A bb 11 3 A cc 11 4 A dd 11 5 B xx 22 6 C yy 33 7 C zz 33 Which way is the simplest? Thank you. 回答1: Using pandas will save you a lot of time if you use Python. So if your DataFrames are df1 : 1 2 0 1 A aa 2 A bb 3 A cc 4 A dd 5 B xx 6 C yy 7 C zz and df2 : 1 2 0 1 A 11 2 B 22 3 C 33 then

Formatting date strings in a file with linux bash shell

强颜欢笑 提交于 2020-05-09 09:45:06
问题 When I cat the file an example of the output format is: ok: servername Mon May 23 00:00:00 EDT 2018 ok: servername Thu Jul 16 00:00:00 EDT 2019 I would like the format to be something like ok: servername 05/23/2018 ok: servername 07/16/2019 I need to use the Linux bash shell to do it. If any one could help me I be very grateful. 回答1: When performance matters. Put this in script.awk : BEGIN{ m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"; m["May"]="05"; m["Jun"]="06"; m["Jul"]="07"

Formatting date strings in a file with linux bash shell

冷暖自知 提交于 2020-05-09 09:44:12
问题 When I cat the file an example of the output format is: ok: servername Mon May 23 00:00:00 EDT 2018 ok: servername Thu Jul 16 00:00:00 EDT 2019 I would like the format to be something like ok: servername 05/23/2018 ok: servername 07/16/2019 I need to use the Linux bash shell to do it. If any one could help me I be very grateful. 回答1: When performance matters. Put this in script.awk : BEGIN{ m["Jan"]="01"; m["Feb"]="02"; m["Mar"]="03"; m["Apr"]="04"; m["May"]="05"; m["Jun"]="06"; m["Jul"]="07"