genetics | 易学教程

What is an idiomatic way of representing enums in Go?

阅读更多关于 What is an idiomatic way of representing enums in Go?

问题 I'm trying to represent a simplified chromosome, which consists of N bases, each of which can only be one of {A, C, T, G} . I'd like to formalize the constraints with an enum, but I'm wondering what the most idiomatic way of emulating an enum is in Go. 回答1: Quoting from the language specs:Iota Within a constant declaration, the predeclared identifier iota represents successive untyped integer constants. It is reset to 0 whenever the reserved word const appears in the source and increments

Transposing a data frame

阅读更多关于 Transposing a data frame

问题 I am trying to transpose a data frame in R, but having very little luck. The data frame contains an epigenetic data set, with 300,000+ CpG sites in the first column. The 74 additional columns are split between the control and experimental groups (cancer = 69, normal = 5). I have converted the data frame into a matrix so I can transpose the data and convert it back to numerical values. However, every time I try to convert the data back to a data frame, it ends up as a list. I'm attempting to

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

阅读更多关于 How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

问题 In genetics very small p-values are common (for example 10^-400), and I am looking for a way to get very small p-values (two-tailed) when the z-score is large in R, for example: z=40 pvalue = 2*pnorm(abs(z), lower.tail = F) This gives me a zero instead of a very small value which is very significant. 回答1: The inability to handle p-values less than about 10^(-308) ( .Machine$double.xmin ) is not really R's fault, but is rather a generic limitation of any computational system that uses double

How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

阅读更多关于 How to compute p-values from z-scores in R when the Z score is large (pvalue much below zero)?

Most efficient way to run regression models for multiple independent variables on the same list of 80 dependent outcomes?

阅读更多关于 Most efficient way to run regression models for multiple independent variables on the same list of 80 dependent outcomes?

问题 What is the most efficient way to run regression models for a list of 20 independent variables (e.g. genetic variants, each of these genetic variants will be tested alone) and 40 dependent variables? I am a beginner to R! I found a solution but it would work only if I had 1 independent variable. Not sure how I would go about if I had many (http://techxhum.dk/loop-multiple-variables/) Thanks for your time. 回答1: Here's a somewhat dense solution that uses the mfastLmCpp() function from the MESS

How to create a Manhattan plot with matplotlib in python?

阅读更多关于 How to create a Manhattan plot with matplotlib in python?

问题 Unfortunately, I have not found a solution myself. How do I create a Manhattan plot within python using, e.g., matplotlib / pandas. The problem is that in these plots the x-axis is discrete. from pandas import DataFrame from scipy.stats import uniform from scipy.stats import randint import numpy as np # some sample data df = DataFrame({'gene' : ['gene-%i' % i for i in np.arange(1000)], 'pvalue' : uniform.rvs(size=1000), 'chromosome' : ['ch-%i' % i for i in randint.rvs(0,12,size=1000)]}) #

How to create a Manhattan plot with matplotlib in python?

阅读更多关于 How to create a Manhattan plot with matplotlib in python?

Complement a DNA sequence

阅读更多关于 Complement a DNA sequence

问题 Suppose I have a DNA sequence. I want to get the complement of it. I used the following code but I am not getting it. What am I doing wrong ? s=readline() ATCTCGGCGCGCATCGCGTACGCTACTAGC p=unlist(strsplit(s,"")) h=rep("N",nchar(s)) unlist(lapply(p,function(d){ for b in (1:nchar(s)) { if (p[b]=="A") h[b]="T" if (p[b]=="T") h[b]="A" if (p[b]=="G") h[b]="C" if (p[b]=="C") h[b]="G" } 回答1: Use chartr which is built for this purpose: > s [1] "ATCTCGGCGCGCATCGCGTACGCTACTAGC" > chartr("ATGC","TACG",s)

How to compare 2 lists of ranges in bash?

阅读更多关于 How to compare 2 lists of ranges in bash?

问题 Using bash script (Ubuntu 16.04), I'm trying to compare 2 lists of ranges: does any number in any of the ranges in file1 coincide with any number in any of the ranges in file2? If so, print the row in the second file. Here I have each range as 2 tab-delimited columns (in file1, row 1 represents the range 1-4, i.e. 1, 2, 3, 4). The real files are quite big. file1: 1 4 5 7 8 11 12 15 file2: 3 4 8 13 20 24 Desired output: 3 4 8 13 My best attempt has been: awk 'NR=FNR { x[$1] = $1+0; y[$2] = $2

Package ‘GeneR’ is not available [duplicate]

阅读更多关于 Package ‘GeneR’ is not available [duplicate]

问题 This question already has answers here : How should I deal with “package 'xxx' is not available (for R version x.y.z)” warning? (15 answers) Closed 4 years ago . I'm trying to install GeneR library (http://www.bioconductor.org/packages/release/bioc/html/GeneR.html): I'm using win7 and the newest R 2.14.2. Error during installation: > source("http://bioconductor.org/biocLite.R") trying URL 'http://www.bioconductor.org/packages/2.9/bioc/bin/windows/contrib/2.14/BiocInstaller_1.2.1.zip' Content