awk

How to pick multiple fasta sequences from a genes list

ぐ巨炮叔叔 提交于 2020-05-09 07:55:32
问题 I have two files The gene list file looks like this LOC_Os06g12230.1 Pavir.Ab03005 Pavir.J14065 ChrUn.fgenesh Sevir.1G325700 LOC_Os02g51280.1 Bradi3g59320 Brast04G017400 Fasta sequence file looks like this >LOC_Os03g57190.1 pacid=33130570 polypeptide=LOC_Os03g57190.1 locus=LOC_Os03g57190 ID=LOC_Os03g57190.1.MSUv7.0 annot-version=v7.0 ATGGAGGCGGCGGTGGGGGACGGGGAAGGCGGTGGCGGCGGCGGCGGGCGGGGGAAGCGTGGGCGGGGAGGAGGAGGAGG GGAGATGGTGGAGGCGGTGTGGGGGCAGACGGGGAGTACGGCGTCGCGGATCTACAGGGTGAGGGCGACGGGGGGGAAGG

How to pick multiple fasta sequences from a genes list

我与影子孤独终老i 提交于 2020-05-09 07:55:08
问题 I have two files The gene list file looks like this LOC_Os06g12230.1 Pavir.Ab03005 Pavir.J14065 ChrUn.fgenesh Sevir.1G325700 LOC_Os02g51280.1 Bradi3g59320 Brast04G017400 Fasta sequence file looks like this >LOC_Os03g57190.1 pacid=33130570 polypeptide=LOC_Os03g57190.1 locus=LOC_Os03g57190 ID=LOC_Os03g57190.1.MSUv7.0 annot-version=v7.0 ATGGAGGCGGCGGTGGGGGACGGGGAAGGCGGTGGCGGCGGCGGCGGGCGGGGGAAGCGTGGGCGGGGAGGAGGAGGAGG GGAGATGGTGGAGGCGGTGTGGGGGCAGACGGGGAGTACGGCGTCGCGGATCTACAGGGTGAGGGCGACGGGGGGGAAGG

jq, split a huge json of array and save into file named with a value

╄→尐↘猪︶ㄣ 提交于 2020-04-28 20:40:51
问题 i have a json containing an array of objects, every object contains a unique value in: "id":"value" i've followed this other answer and i can split the whole document in multiple files using jq and awk jq -c ".[]" big.json | gawk '{print > "doc00" NR ".json";}' in this way the output files are named sequentially. how i can name the files using the id value? 回答1: For each element in the array, print id and the element itself in two separate lines, thus you can grab the id from odd numbered

How to process and save data in chunks using awk?

感情迁移 提交于 2020-04-21 04:49:24
问题 Let's say I'm opening a large (several GB) file where I cannot read in the entire file as once. If it's a csv file, we would use: for chunk in pd.read_csv('path/filename', chunksize=10**7): # save chunk to disk Or we could do something similar with pandas: import pandas as pd with open(fn) as file: for line in file: # save line to disk, e.g. df=pd.concat([df, line_data]), then save the df How does one "chunk" data with an awk script? Awk will parse/process text into a format you desire, but I

How to process and save data in chunks using awk?

大兔子大兔子 提交于 2020-04-21 04:45:18
问题 Let's say I'm opening a large (several GB) file where I cannot read in the entire file as once. If it's a csv file, we would use: for chunk in pd.read_csv('path/filename', chunksize=10**7): # save chunk to disk Or we could do something similar with pandas: import pandas as pd with open(fn) as file: for line in file: # save line to disk, e.g. df=pd.concat([df, line_data]), then save the df How does one "chunk" data with an awk script? Awk will parse/process text into a format you desire, but I

How to process and save data in chunks using awk?

廉价感情. 提交于 2020-04-21 04:44:57
问题 Let's say I'm opening a large (several GB) file where I cannot read in the entire file as once. If it's a csv file, we would use: for chunk in pd.read_csv('path/filename', chunksize=10**7): # save chunk to disk Or we could do something similar with pandas: import pandas as pd with open(fn) as file: for line in file: # save line to disk, e.g. df=pd.concat([df, line_data]), then save the df How does one "chunk" data with an awk script? Awk will parse/process text into a format you desire, but I

How to merge two files based on data in multiple columns?

大兔子大兔子 提交于 2020-04-16 03:47:06
问题 I have two separate files, each containing a different number of columns which I want to merge based on data in multiple columns. file1 VMNF01000015.1 1769465 1769675 . . - Focub_II5_mimp_1 VMNF01000014.1 3225875 3226081 . . + Focub_II5_mimp_1 VMNF01000014.1 3226046 3226081 . . - Focub_II5_mimp_1 VMNF01000014.1 3585246 3585281 . . - Focub_II5_mimp_1 VMNF01000014.1 3692468 3692503 . . - Focub_II5_mimp_1 VMNF01000014.1 3715380 3715415 . . + Focub_II5_mimp_1 VMNF01000014.1 2872478 2872511 . . -

Assistance with awk/bash to capture memory difference

落花浮王杯 提交于 2020-04-15 03:48:28
问题 I am trying to extract the following output from the following file: xr_lab# show clock Thu Sep 19 14:38:02.812 WIB 14:38:02.893 WIB Thu Sep 19 2019 xr_lab# xr_lab# xr_lab#show memory compare report Thu Sep 19 14:41:08.084 WIB PID NAME MEM BEFORE MEM AFTER DIFFERENCE MALLOCS-NEW ------------------------------------------------------------------------------- 6777 ospf 24292985 24293753 768 272634 7582 mibd_interface 8670334 8484152 -186182 267657 xr_lab#show clock Thu Sep 19 14:42:42.425 WIB

Awk command to Powershell equivalent

╄→гoц情女王★ 提交于 2020-04-11 12:27:50
问题 I hope can you help me, essentially, I'm looking for the Powershell equivalent of the awk command: awk '/"Box11"/ { print $0 }' test.txt|awk '{ SUM += $4} END { print SUM} ' 回答1: Multiple ways of doing it but this would do the trick: Get-Content c:\temp\test.txt | Where-Object{$_ -match '"Box11"'} | ForEach-Object{($_ -split "\s+")[3]} | Measure-Object -Sum | Select-Object -ExpandProperty Sum Get a string array of the file. For each line that contains the string "Box11" we split the line on

Why uniq -c output with space instead of \t?

主宰稳场 提交于 2020-04-08 10:09:11
问题 I use uniq -c some text file. Its output like this: 123(space)first word(tab)other things 2(space)second word(tab)other things .... So I need extract total number(like 123 and 2 above), but I can't figure out how to, because if I split this line by space, it will like this ['123', 'first', 'word(tab)other', 'things'] . I want to know why doesn't it output with tab? And how to extract total number in shell? ( I finally extract it with python, WTF) Update : Sorry, I didn't describe my question