bioinformatics | 易学教程

trace patterns such that each node is visited only once(eulerian path) using opencv

阅读更多关于 trace patterns such that each node is visited only once(eulerian path) using opencv

问题 Here is my problem which I am trying to solve since one complete year. With no success till end of the year. I have to seek help and a concrete solutions from the stackoverflow experts. My problem statement: I have been working with some design patterns which I want to trace if eulerian path exist(as shown in below gifs), programmatically. Below are the patterns and the way I wanna draw them(gifs). What I wanna achieve: Give the design pattern images as input. I want trace the design pattern

How to plot positions along a chromosome graphic

阅读更多关于 How to plot positions along a chromosome graphic

问题 I would like to generate a plot depicting 14 linear chromosomes for the organism I work on, to scale, with coloured bars at specified locations along each chromosome. Ideally I'd like to use R as this is the only programming language I have experience with. I have explored various ways of doing this e.g. with GenomeGraphs but I have found this is all more complicated than what I want/ displays a lot more data than what I have (e.g. displaying cytogenic bands) and is often specific for human

Remove rows from dataframe that contains only 0 or just a single 0

阅读更多关于 Remove rows from dataframe that contains only 0 or just a single 0

I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary. I have tried to paste some of my data here with the results I want to obtain. unfiltered: ID GeneName DU145small DU145total PC3small PC3total 1 MIR22HG 33221.5 1224.55 2156.43 573.315 2 MIRLET7E 87566.1 7737.99 25039.3 16415.6 3 MIR612 0 0 530.068 0 4 MIR218-1 0 0

Perl: How to join two columns of a text file, in which values of the first column should match in order with the values of the second column

阅读更多关于 Perl: How to join two columns of a text file, in which values of the first column should match in order with the values of the second column

问题 I am a beginner with Perl programming. The problem I am working on right now is how to get the gene length from a text file. Text file contains the gene name (column 10), start site (column 6), end site (column 7). The length can be derived from the difference of column 6 and 7. But my problem is how to match the gene name (from column 10) with the corresponding difference derived from the difference of column 6 and column 7. Thank you very much! open (IN, "Alu.txt"); open (OUT, ">Alu

Scanf_s warning? Skips User Inputs (topics: Runge-Kutta, Epidemic Simulation)

阅读更多关于 Scanf_s warning? Skips User Inputs (topics: Runge-Kutta, Epidemic Simulation)

问题 This is my first post and I have to admit, I am terrible at programming. I am that guy in the class that works his tail off, but can never seem to grasp programming as well as the rest of my classmates. So please be nice, I will try to explain my problem below. I have the following code (comments removed), but when I run it I get a warning similar to the one listed below. Also, when I run the program, the first user inputted value is allowed, but then all of the sudden, it jumps to the end of

Regex Protein Digestion

阅读更多关于 Regex Protein Digestion

So, I'm digesting a protein sequence with an enzyme (for your curiosity, Asp-N) which cleaves before the proteins coded by B or D in a single-letter coded sequence. My actual analysis uses String#scan for the captures. I'm trying to figure out why the following regular expression doesn't digest it correctly... (\w*?)(?=[BD])|(.*\b) where the antecedent (.*\b) exists to capture the end of the sequence. For: MTMDKPSQYDKIEAELQDICNDVLELLDSKGDYFRYLSEVASGDN This should give something like: [MTM, DKPSQY, DKIEAELQ, DICN, DVLELL, DSKG, ... ] but instead misses each D in the sequence. I've been using

Reading in file block by block using specified delimiter in python

阅读更多关于 Reading in file block by block using specified delimiter in python

问题 I have an input_file.fa file like this (FASTA format): > header1 description data data data >header2 description more data data data I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data, e.g. block 1: > header1 description data data data Of course I could just read in the file like this and split: with open("1.fa") as f: for block in f.read().split(">"): pass But I want to avoid the reading the whole file into memory , because the

Filtering a CSV file in python

阅读更多关于 Filtering a CSV file in python

问题 I have downloaded this csv file, which creates a spreadsheet of gene information. What is important is that in the HLA-* columns, there is gene information. If the gene is too low of a resolution e.g. DQB1*03 then the row should be deleted. If the data is too high resoltuion e.g. DQB1*03:02:01 , then the :01 tag at the end needs to be removed. So, ideally I want to proteins to be in the format DQB1*03:02 , so that it has two levels of resolution after DQB1* . How can I tell python to look for

Remove item from list based on the next item in same list

阅读更多关于 Remove item from list based on the next item in same list

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example: ABCDE ABCDEFG ABCDEFGH ABCDEFGHIJKLMNO CEST DBTSFDE DBTSFDEO EOEUDNBNUW EOEUDNBNUWD EAEUDNBNUW FEOEUDNBNUW FG FGH I would like to remove those shorter overlap and just keep the longest one so the desired output would look like this: ABCDEFGHIJKLMNO CEST DBTSFDEO EAEUDNBNUW FEOEUDNBNUWD FGH How can I do it? My code looks like this: with open('toy.txt' ,'r') as f: pattern = f.read().splitlines() print pattern for i in range

AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

阅读更多关于 AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

Currently I'm struggling with an AWK problem that I haven't been able to solve yet. I have one huge file (30GB) with genomic data that holds a list with positions (declared in col 1 and 2) and a second list that holds a number of ranges (declared in col 3, 4 and 5). I want to extract all lines in the first file where the position falls within the range declared in the seconds file. As the position is only unique within a certain chromosome (chr) first it has to be tested if the chr's are identical (ie. col1 in file 1 matches col3 in file2) file 1 chromosome position another....hundred....