genetics

How to plot positions along a chromosome graphic

六眼飞鱼酱① 提交于 2019-12-03 02:20:32
I would like to generate a plot depicting 14 linear chromosomes for the organism I work on, to scale, with coloured bars at specified locations along each chromosome. Ideally I'd like to use R as this is the only programming language I have experience with. I have explored various ways of doing this e.g. with GenomeGraphs but I have found this is all more complicated than what I want/ displays a lot more data than what I have (e.g. displaying cytogenic bands) and is often specific for human chromosomes. All I essentially want is 14 grey bars of the following sizes: chromosome size 1 640851 2

R- How to plot correct pie charts in haploNet haplotyp Networks {pegas} {ape} {adegenet}

不想你离开。 提交于 2019-12-01 11:28:22
When using the haploNet package to make some plots on a haplotype network, I used a script available on the internet to do so. However I think there is something wrong. The script is available in form of the woodmouse example. The code I used is: x <- read.dna(file="Masto.fasta",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) plot(net, size = attr(net, "freq"), fast = TRUE) plot(net, size = attr(net, "freq")) plot(net, size=attr(net, "freq"), scale.ratio = 2, cex = 0.8 table(rownames(x)) ind.hap<-with( stack(setNames(attr(h, "index"), rownames(h))), table(hap=ind, pop=rownames(x

R- How to plot correct pie charts in haploNet haplotyp Networks {pegas} {ape} {adegenet}

做~自己de王妃 提交于 2019-12-01 07:36:28
问题 When using the haploNet package to make some plots on a haplotype network, I used a script available on the internet to do so. However I think there is something wrong. The script is available in form of the woodmouse example. The code I used is: x <- read.dna(file="Masto.fasta",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) plot(net, size = attr(net, "freq"), fast = TRUE) plot(net, size = attr(net, "freq")) plot(net, size=attr(net, "freq"), scale.ratio = 2, cex = 0.8 table

Complement a DNA sequence

空扰寡人 提交于 2019-11-30 13:39:45
Suppose I have a DNA sequence. I want to get the complement of it. I used the following code but I am not getting it. What am I doing wrong ? s=readline() ATCTCGGCGCGCATCGCGTACGCTACTAGC p=unlist(strsplit(s,"")) h=rep("N",nchar(s)) unlist(lapply(p,function(d){ for b in (1:nchar(s)) { if (p[b]=="A") h[b]="T" if (p[b]=="T") h[b]="A" if (p[b]=="G") h[b]="C" if (p[b]=="C") h[b]="G" } Use chartr which is built for this purpose: > s [1] "ATCTCGGCGCGCATCGCGTACGCTACTAGC" > chartr("ATGC","TACG",s) [1] "TAGAGCCGCGCGTAGCGCATGCGATGATCG" Just give it two equal-length character strings and your string. Also

How to edit 300 GB text file (genomics data)?

一个人想着一个人 提交于 2019-11-30 08:43:23
问题 I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program 'Popoolution' allows us to comment out the "bad" records with an asterisk. Our problem is that we cannot find a text editor that will load the data so that we can comment out the bad records. Any suggestions? We have both Windows and Linux boxes. UPDATE: More information The program Popoolution (https://code.google.com/p/popoolation/) crashes when it

How to compare 2 lists of ranges in bash?

末鹿安然 提交于 2019-11-29 11:33:21
Using bash script (Ubuntu 16.04), I'm trying to compare 2 lists of ranges: does any number in any of the ranges in file1 coincide with any number in any of the ranges in file2? If so, print the row in the second file. Here I have each range as 2 tab-delimited columns (in file1, row 1 represents the range 1-4, i.e. 1, 2, 3, 4). The real files are quite big. file1: 1 4 5 7 8 11 12 15 file2: 3 4 8 13 20 24 Desired output: 3 4 8 13 My best attempt has been: awk 'NR=FNR { x[$1] = $1+0; y[$2] = $2+0; next}; {for (i in x) {if (x[i] > $1+0); then {for (i in y) {if (y[i] <$2+0); then {print $1, $2}}}}}

How to edit 300 GB text file (genomics data)?

不打扰是莪最后的温柔 提交于 2019-11-29 07:26:13
I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program 'Popoolution' allows us to comment out the "bad" records with an asterisk. Our problem is that we cannot find a text editor that will load the data so that we can comment out the bad records. Any suggestions? We have both Windows and Linux boxes. UPDATE: More information The program Popoolution ( https://code.google.com/p/popoolation/ ) crashes when it reaches a "bad" record giving us the line number that we can then comment out. Specifically, we get a

How to plot Pie charts in haploNet Haplotype Networks {pegas}

怎甘沉沦 提交于 2019-11-28 09:32:26
I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <- read.dna(file="x.fas",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) I'd like to set in the dnabin data the label of the original population of each taxa, so i could have piecharts of different colors (of haplotypes from different populations) in the resulting network. I'd like also to remove overlapping circles in the resulting haplotype network.

How to plot Pie charts in haploNet Haplotype Networks {pegas}

霸气de小男生 提交于 2019-11-27 02:59:58
问题 I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <- read.dna(file="x.fas",format="fasta") h <- haplotype(x) net <- haploNet(h) plot(net) I'd like to set in the dnabin data the label of the original population of each taxa, so i could have piecharts of different colors (of haplotypes from different populations) in the

Merge by Range in R - Applying Loops

柔情痞子 提交于 2019-11-27 01:48:12
I posted a question here: Matched Range Merge in R about merging two files based on a number in one file falling into a range in the second file. Thus far, I have been unsuccessful in piecing together code to accomplish this. The issue I am having is that the code I'm using compares the files line by line. This is a problem because 1.) One file is much longer than the other file, and 2.) I need the lines in the shorter file to be scanned through every range pair in the longer file - not just the range in the same row. I have been working with the functions posted in the original question, and