bioinformatics

Odds ratio for ordinal variables from PROC GENMOD

给你一囗甜甜゛ 提交于 2019-12-24 06:03:32
问题 I have a set of data where I am creating a logistic regression model, looking at the odds of a binary outcome variable (Therapy), with Stage as an ordinal explanatory variable (0,1,2,3,4). A1c is a continuous variable. Because each patient has two eyes, I must use the repeated subject = patientID(EyeID) statement. The following is my code: PROC GENMOD data=new descend; class patientID EyeID Stage (param = ordinal) Therapy (ref ="0") Gender(ref="M") Ethnic agegroup/ PARAM=ref; model Therapy =

Getting p-values from leave-one-out in R

本秂侑毒 提交于 2019-12-24 04:49:28
问题 I have a data frame of 96 observations (patients) and 1098 variables (genes). The response is binary (Y and N) and the predictors are numeric. I am trying to perform leave-one-out cross validation, but my interest is not standard error, but the p-values for each variable from each of the 95 logistic regression models created from LOOCV. These are my attempts thus far: #Data frame 96 observations 1098 variables DF2 fit <- list() for (i in 1:96){ df <- DF2[-i,] fit[[i]] <- glm (response ~.,

Makefile - samtools installation failed

假如想象 提交于 2019-12-24 00:42:42
问题 I'm trying to install samtools on openSUSE, I did this: cd htslib-1.2.1 ./configure make install Worked fine. bcftools-1.2 ./configure make install Worked fine. And for samtools: cd samtools-1.2 make install Produces this output: /usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: skipping incompatible /usr/lib/libcurses.so when searching for -lcurses /usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: cannot find -lcurses collect2: error: ld

Exchanging elements (crossover) between two vectors

蹲街弑〆低调 提交于 2019-12-24 00:25:57
问题 Assume I have: chromosome_1 <- c('0010000001010000') chromosome_2 <- c('0100000001001010') How can I implement step 3-5 ? Evaluate NC1 = no. of 1's in chromosome_1 NC2 = no. of 1's in chromosome_2 M = min(NC1, NC2) Generate a random integer NC from range(1, M) Randomly select NC gene positions among the genes with allele “1” from chromosome_1 and form a set s1 of indices of such selected positions. Randomly select NC gene positions among the genes with allele “1” from chromosome_2 and form a

How to find specific frequency of a codon?

痞子三分冷 提交于 2019-12-23 21:16:49
问题 I am trying to make a function in R which could calculate the frequency of each codon. We know that methionine is an amino acid which could be formed by only one set of codon ATG so its percentage in every set of sequence is 1. Where as Glycine could be formed by GGT, GGC, GGA, GGG hence the percentage of occurring of each codon will be 0.25. The input would be in a DNA sequence like-ATGGGTGGCGGAGGG and with the help of codon table it could calculate the percentage of each occurrence in an

sort fasta by sequence size

不羁的心 提交于 2019-12-23 20:47:19
问题 I currently want to sort a hudge fasta file (+10**8 lines and sequences) by sequence size. fasta is a clear defined format in biology use to store sequence (genetic or proteic): >id1 sequence 1 # could be on several line >id2 sequence 2 ... I have run a tools that give me in tsv format: the Identifiant, the length, and the position in bytes of the identifiant. for now what I am doing is to sort this file by the length column then I parse this file and use seek to retrieve the corresponding

extract overlapping regions

∥☆過路亽.° 提交于 2019-12-23 05:06:16
问题 I have a file characterizing genomic regions that looks like this: chrom chromStart chromEnd PGB chr1 12874 28371 2 chr1 15765 21765 1 chr1 15795 28371 2 chr1 18759 24759 1 chr1 28370 34961 1 chr3 233278 240325 1 chr3 239279 440831 2 chr3 356365 362365 1 Basically PGB describes the category of the genomic region characterised by its chromosome number (chrom), start (chromStart) and end (chromEnd) coordinates. I wish to collapse the overlapping regions such that overlapping regions of PGB = 1

compare multiple hashes for common keys merge values

≯℡__Kan透↙ 提交于 2019-12-23 04:04:45
问题 I have a working bit of code here where I am comparing the keys of six hashes together to find the ones that are common amongst all of them. I then combine the values from each hash into one value in a new hash. What I would like to do is make this scaleable. I would like to be able to easily go from comparing 3 hashes to 100 without having to go back into my code and altering it. Any thoughts on how I would achieve this? The rest of the code already works well for different input amounts,

Genome coverage as sliding window

别等时光非礼了梦想. 提交于 2019-12-23 01:00:08
问题 I mapped my reads to my assembly using the bwa mem algorithm and extracted the number of reads per base (= coverage) using samtools depth . The resulting file is the following: 1091900001 1 236 1091900001 2 245 1091900001 3 265 1091900001 4 283 1091900001 5 288 1091900002 1 297 1091900002 2 312 1091900002 3 327 1091900002 4 338 1091900002 5 348 with three columns: name of the contig (since it is a multi-contig file, this ID changes) - position (base) - number of reads that mapped (coverage).

Modify r object with rpy2

风格不统一 提交于 2019-12-22 18:23:27
问题 I'm trying to use rpy2 to use the DESeq2 R/Bioconductor package in python. I actually solved my problem while writing my question (using do_slots allows access to the r objects attributes), but I think the example might be useful for others, so here is how I do in R and how this translates in python: In R I can create a "DESeqDataSet" from two data frames as follows: counts_data <- read.table("long/path/to/file", header=TRUE, row.names="gene") head(counts_data) ## WT_RT_1 WT_RT_2 prg1_RT_1