bioinformatics | 易学教程

How to compare and merge multiple files?

阅读更多关于 How to compare and merge multiple files?

问题 reference file chr1 288598 288656 chr1 779518 779576 chr2 2569592 2569660 chr3 5018399 5018464 chr4 5182842 5182882 file1 chr1 288598 288656 12 chr1 779518 779576 14 chr2 2569592 2569660 26 chr3 5018399 5018464 27 chr4 5182842 5182882 37 file2 chr1 288598 288656 35 chr2 2569592 2569660 348 chr3 5018399 5018464 4326 chr4 5182842 5182882 68 I have six similar files excluding the reference file. Here first three fields are similar to the reference file. Therefore, I would like export only 4th

How do I read strings into a hash in Perl [closed]

阅读更多关于 How do I read strings into a hash in Perl [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I have a file with a series of random A's, G's, C's and T's in them that look like this: >Mary ACGTACGTACGTAC >Jane CCCGGCCCCTA >Arthur AAAAAAAAAAT I took those letters and concatinated them to end up with ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAT . I now have a series of positions within that concatenated sequence

How can I merge CSV rows that have the same value in the first cell?

阅读更多关于 How can I merge CSV rows that have the same value in the first cell?

问题 This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column? For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T". 回答1: Plain Python: import csv reader = csv.Reader(open('combined.csv')) result = {} for row in reader:

Extract and paste together multiple columns of a data frame like object using a vector of column names

阅读更多关于 Extract and paste together multiple columns of a data frame like object using a vector of column names

问题 I have an object (variable rld ) which looks a bit like a "data.frame" (see further down the post for details) in that it has columns that can be accessed using $ or [[]] . I have a vector groups containing names of some of its columns (3 in example below). I generate strings based on combinations of elements in the columns as follows: paste(rld[[groups[1]]], rld[[groups[2]]], rld[[groups[3]]], sep="-") I would like to generalize this so that I don't need to know how many elements are in

Converting comma separated file to nested objects json in jq

阅读更多关于 Converting comma separated file to nested objects json in jq

问题 I have a CSV file which I would like to parse and obtain a Nested JSON using jq. I have started to use JQ recently and I really like the tool. I understand basic functionalities, but parsing a csv file seems a little difficult especially to print nested objects. Sample Input Gene, Exon,Total,Exon Bases, Total Bases, Fraction of Exon bases PIK3CA,PIK3CA_Exon10;chr1;1000;1500,PIK3CA_Exon13;chr1;1000;1500,PIK3CA_Exon14;chr1;1000;1500,1927879,12993042,0.15 NRAS,NRAS_Exon4;chr1;1000;1500,NRAS_Amp

Curiously behaving IF block in Perl run on Windows

阅读更多关于 Curiously behaving IF block in Perl run on Windows

问题 Background: I have a Perl script that I wrote to go through two files. The basic point of the script is to identify overlaps between one list of coordinates, defining the beginnings and ends of randomly selected chromosomal segments, and a second list of coordinates, defining the beginnings and endings of actual gene transcripts. The first input file contains three columns. The first is for the chromosome number, and the second and third are the proximal and distal coordinates, in base pairs,

Using regex to search until desired pattern

阅读更多关于 Using regex to search until desired pattern

问题 I am using the following regex: orfre = '^(?:...)*?((ATG)(...){%d,}?(?=(TAG|TAA|TGA)))' % (aa) I basically want to find all sequences that start with ATG followed by triplets (e.g. TTA, TTC, GTC, etc.) until it finds a stop codon in frame. However, as my regex is written, it won't actually stop at a stop codon if aa is large. Instead, it will keep searching until it finds one such that the condition of aa is met. I would rather have it search the entire string until a stop codon is found. If

Pressure to Images and Images to pressure conversion

阅读更多关于 Pressure to Images and Images to pressure conversion

问题 EDIT:I am working on the Pressure map dataset where the pressure sensors data is "in-bed posture pressure data". Dataset : https://physionet.org/content/pmd/1.0.0/ Using the below code I could able to convert the pressure data to images. line = f.readlines()[3] lst1 = line.strip().split() lst = [int(x) for x in lst1] # Convert into a 64*32 array rr = np.asarray(lst).reshape(64, 32) plt.imshow(arr, cmap='hot', interpolation='nearest') Images formed are as below: Now, my major motto is to

Perl: Search a pattern across array elements

阅读更多关于 Perl: Search a pattern across array elements

问题 I am a Perl newbie, stuck with another bioinformatics problem that requires some help and input. The problem in brief: I have a file, which has over 40,000 unique DNA sequences. By unique, I mean unique sequence id. I am attaching a portion of it at the end of my post to help you show what it looks like. I need to divide each of the 40,000 sequences into 3 parts. So if a particular sequence is 999 character long, each of the 3 parts would have 333 characters. I need to look for the following

How to returns correctly matched array?

阅读更多关于 How to returns correctly matched array?

问题 I have a function that takes a string of DNA and how to return correctly matched dna array The code that I have tried: function checkDNA(dna) { var dnaarr = []; for(var i = 0; i < dna.length; i++) { var str = []; str.push(dna[i]); //pushing current str[i] if(dna[i].indexOf('') === 0) { var a = str.push('sd'); } if(dna[i].indexOf('GGC') === 0) { var b = str.push("GC", "GC", "CG"); } if(dna[i].indexOf('gat') === 0) { var c = str.push("GC", "AT", "TA"); } if(dna[i].indexOf('PGYYYHVB') === 0) {