genetics

What is an idiomatic way of representing enums in Go?

旧城冷巷雨未停 提交于 2021-02-17 05:49:31
问题 I'm trying to represent a simplified chromosome, which consists of N bases, each of which can only be one of {A, C, T, G} . I'd like to formalize the constraints with an enum, but I'm wondering what the most idiomatic way of emulating an enum is in Go. 回答1: Quoting from the language specs:Iota Within a constant declaration, the predeclared identifier iota represents successive untyped integer constants. It is reset to 0 whenever the reserved word const appears in the source and increments

Transposing a data frame

℡╲_俬逩灬. 提交于 2021-02-16 13:46:44
问题 I am trying to transpose a data frame in R, but having very little luck. The data frame contains an epigenetic data set, with 300,000+ CpG sites in the first column. The 74 additional columns are split between the control and experimental groups (cancer = 69, normal = 5). I have converted the data frame into a matrix so I can transpose the data and convert it back to numerical values. However, every time I try to convert the data back to a data frame, it ends up as a list. I'm attempting to

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

丶灬走出姿态 提交于 2021-02-07 08:17:51
问题 In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example: z=40 pvalue = 2*pnorm(abs(z), lower.tail = F) This gives me a zero instead of a very small value which is very significant. 回答1: The inability to handle p-values less than about 10^(-308) ( .Machine$double.xmin ) is not really R's fault, but is rather a generic limitation of any computational system that uses double

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

六眼飞鱼酱① 提交于 2021-02-07 08:14:15
问题 In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example: z=40 pvalue = 2*pnorm(abs(z), lower.tail = F) This gives me a zero instead of a very small value which is very significant. 回答1: The inability to handle p-values less than about 10^(-308) ( .Machine$double.xmin ) is not really R's fault, but is rather a generic limitation of any computational system that uses double

Most efficient way to run regression models for multiple independent variables on the same list of 80 dependent outcomes?

爷,独闯天下 提交于 2021-01-28 01:48:59
问题 What is the most efficient way to run regression models for a list of 20 independent variables (e.g. genetic variants, each of these genetic variants will be tested alone) and 40 dependent variables? I am a beginner to R! I found a solution but it would work only if I had 1 independent variable. Not sure how I would go about if I had many (http://techxhum.dk/loop-multiple-variables/) Thanks for your time. 回答1: Here's a somewhat dense solution that uses the mfastLmCpp() function from the MESS

How to create a Manhattan plot with matplotlib in python?

可紊 提交于 2020-08-02 06:25:49
问题 Unfortunately, I have not found a solution myself. How do I create a Manhattan plot within python using, e.g., matplotlib / pandas. The problem is that in these plots the x-axis is discrete. from pandas import DataFrame from scipy.stats import uniform from scipy.stats import randint import numpy as np # some sample data df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(1000)], 'pvalue' : uniform.rvs(size=1000), 'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=1000)]}) #

How to create a Manhattan plot with matplotlib in python?

☆樱花仙子☆ 提交于 2020-08-02 06:24:34
问题 Unfortunately, I have not found a solution myself. How do I create a Manhattan plot within python using, e.g., matplotlib / pandas. The problem is that in these plots the x-axis is discrete. from pandas import DataFrame from scipy.stats import uniform from scipy.stats import randint import numpy as np # some sample data df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(1000)], 'pvalue' : uniform.rvs(size=1000), 'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=1000)]}) #

Complement a DNA sequence

萝らか妹 提交于 2020-01-10 19:33:27
问题 Suppose I have a DNA sequence. I want to get the complement of it. I used the following code but I am not getting it. What am I doing wrong ? s=readline() ATCTCGGCGCGCATCGCGTACGCTACTAGC p=unlist(strsplit(s,"")) h=rep("N",nchar(s)) unlist(lapply(p,function(d){ for b in (1:nchar(s)) { if (p[b]=="A") h[b]="T" if (p[b]=="T") h[b]="A" if (p[b]=="G") h[b]="C" if (p[b]=="C") h[b]="G" } 回答1: Use chartr which is built for this purpose: > s [1] "ATCTCGGCGCGCATCGCGTACGCTACTAGC" > chartr("ATGC","TACG",s)

How to compare 2 lists of ranges in bash?

为君一笑 提交于 2019-12-18 06:59:13
问题 Using bash script (Ubuntu 16.04), I'm trying to compare 2 lists of ranges: does any number in any of the ranges in file1 coincide with any number in any of the ranges in file2? If so, print the row in the second file. Here I have each range as 2 tab-delimited columns (in file1, row 1 represents the range 1-4, i.e. 1, 2, 3, 4). The real files are quite big. file1: 1 4 5 7 8 11 12 15 file2: 3 4 8 13 20 24 Desired output: 3 4 8 13 My best attempt has been: awk 'NR=FNR { x[$1] = $1+0; y[$2] = $2

Package ‘GeneR’ is not available [duplicate]

久未见 提交于 2019-12-13 04:39:46
问题 This question already has answers here : How should I deal with “package 'xxx' is not available (for R version x.y.z)” warning? (15 answers) Closed 4 years ago . I'm trying to install GeneR library (http://www.bioconductor.org/packages/release/bioc/html/GeneR.html): I'm using win7 and the newest R 2.14.2. Error during installation: > source("http://bioconductor.org/biocLite.R") trying URL 'http://www.bioconductor.org/packages/2.9/bioc/bin/windows/contrib/2.14/BiocInstaller_1.2.1.zip' Content