bioinformatics

trace patterns such that each node is visited only once(eulerian path) using opencv

荒凉一梦 提交于 2019-12-04 14:38:36
问题 Here is my problem which I am trying to solve since one complete year. With no success till end of the year. I have to seek help and a concrete solutions from the stackoverflow experts. My problem statement: I have been working with some design patterns which I want to trace if eulerian path exist(as shown in below gifs), programmatically. Below are the patterns and the way I wanna draw them(gifs). What I wanna achieve: Give the design pattern images as input. I want trace the design pattern

How to plot positions along a chromosome graphic

血红的双手。 提交于 2019-12-04 09:28:52
问题 I would like to generate a plot depicting 14 linear chromosomes for the organism I work on, to scale, with coloured bars at specified locations along each chromosome. Ideally I'd like to use R as this is the only programming language I have experience with. I have explored various ways of doing this e.g. with GenomeGraphs but I have found this is all more complicated than what I want/ displays a lot more data than what I have (e.g. displaying cytogenic bands) and is often specific for human

Remove rows from dataframe that contains only 0 or just a single 0

孤街醉人 提交于 2019-12-04 05:31:34
I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns. Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary. I have tried to paste some of my data here with the results I want to obtain. unfiltered: ID GeneName DU145small DU145total PC3small PC3total 1 MIR22HG 33221.5 1224.55 2156.43 573.315 2 MIRLET7E 87566.1 7737.99 25039.3 16415.6 3 MIR612 0 0 530.068 0 4 MIR218-1 0 0

Perl: How to join two columns of a text file, in which values of the first column should match in order with the values of the second column

*爱你&永不变心* 提交于 2019-12-04 04:58:05
问题 I am a beginner with Perl programming. The problem I am working on right now is how to get the gene length from a text file. Text file contains the gene name (column 10), start site (column 6), end site (column 7). The length can be derived from the difference of column 6 and 7. But my problem is how to match the gene name (from column 10) with the corresponding difference derived from the difference of column 6 and column 7. Thank you very much! open (IN, "Alu.txt"); open (OUT, ">Alu

Scanf_s warning? Skips User Inputs (topics: Runge-Kutta, Epidemic Simulation)

感情迁移 提交于 2019-12-04 04:52:25
问题 This is my first post and I have to admit, I am terrible at programming. I am that guy in the class that works his tail off, but can never seem to grasp programming as well as the rest of my classmates. So please be nice, I will try to explain my problem below. I have the following code (comments removed), but when I run it I get a warning similar to the one listed below. Also, when I run the program, the first user inputted value is allowed, but then all of the sudden, it jumps to the end of

Regex Protein Digestion

牧云@^-^@ 提交于 2019-12-04 03:46:00
So, I'm digesting a protein sequence with an enzyme (for your curiosity, Asp-N) which cleaves before the proteins coded by B or D in a single-letter coded sequence. My actual analysis uses String#scan for the captures. I'm trying to figure out why the following regular expression doesn't digest it correctly... (\w*?)(?=[BD])|(.*\b) where the antecedent (.*\b) exists to capture the end of the sequence. For: MTMDKPSQYDKIEAELQDICNDVLELLDSKGDYFRYLSEVASGDN This should give something like: [MTM, DKPSQY, DKIEAELQ, DICN, DVLELL, DSKG, ... ] but instead misses each D in the sequence. I've been using

Reading in file block by block using specified delimiter in python

冷暖自知 提交于 2019-12-04 03:42:44
问题 I have an input_file.fa file like this (FASTA format): > header1 description data data data >header2 description more data data data I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data, e.g. block 1: > header1 description data data data Of course I could just read in the file like this and split: with open("1.fa") as f: for block in f.read().split(">"): pass But I want to avoid the reading the whole file into memory , because the

Filtering a CSV file in python

六月ゝ 毕业季﹏ 提交于 2019-12-04 02:17:57
问题 I have downloaded this csv file, which creates a spreadsheet of gene information. What is important is that in the HLA-* columns, there is gene information. If the gene is too low of a resolution e.g. DQB1*03 then the row should be deleted. If the data is too high resoltuion e.g. DQB1*03:02:01 , then the :01 tag at the end needs to be removed. So, ideally I want to proteins to be in the format DQB1*03:02 , so that it has two levels of resolution after DQB1* . How can I tell python to look for

Remove item from list based on the next item in same list

*爱你&永不变心* 提交于 2019-12-03 22:23:38
I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example: ABCDE ABCDEFG ABCDEFGH ABCDEFGHIJKLMNO CEST DBTSFDE DBTSFDEO EOEUDNBNUW EOEUDNBNUWD EAEUDNBNUW FEOEUDNBNUW FG FGH I would like to remove those shorter overlap and just keep the longest one so the desired output would look like this: ABCDEFGHIJKLMNO CEST DBTSFDEO EAEUDNBNUW FEOEUDNBNUWD FGH How can I do it? My code looks like this: with open('toy.txt' ,'r') as f: pattern = f.read().splitlines() print pattern for i in range

AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

旧街凉风 提交于 2019-12-03 21:53:21
Currently I'm struggling with an AWK problem that I haven't been able to solve yet. I have one huge file (30GB) with genomic data that holds a list with positions (declared in col 1 and 2) and a second list that holds a number of ranges (declared in col 3, 4 and 5). I want to extract all lines in the first file where the position falls within the range declared in the seconds file. As the position is only unique within a certain chromosome (chr) first it has to be tested if the chr's are identical (ie. col1 in file 1 matches col3 in file2) file 1 chromosome position another....hundred....