bioinformatics | 易学教程

Filtering a CSV file in python

阅读更多关于 Filtering a CSV file in python

I have downloaded this csv file , which creates a spreadsheet of gene information. What is important is that in the HLA-* columns, there is gene information. If the gene is too low of a resolution e.g. DQB1*03 then the row should be deleted. If the data is too high resoltuion e.g. DQB1*03:02:01 , then the :01 tag at the end needs to be removed. So, ideally I want to proteins to be in the format DQB1*03:02 , so that it has two levels of resolution after DQB1* . How can I tell python to look for these formats, and ignore the data stored in them. e.g. if (csvCell is of format DQB1*03:02:01):

R Bioconductor installation error - Line starting '< DOCTYPE html PUBLI …' is malformed

阅读更多关于 R Bioconductor installation error - Line starting '< DOCTYPE html PUBLI …' is malformed

I'm having trouble installing bioconductor packages in R. This is on MacOSX, a fresh install of R 2.15, and using bioconductor 1.4.4. Transcript follows: > source("http://bioconductor.org/biocLite.R") BiocInstaller version 1.4.4, ?biocLite for help > biocLite("Biobase") BioC_mirror: http://bioconductor.org Using R version 2.15, BiocInstaller version 1.4.4. Warning: unable to access index for repository http://brainarray.mbni.med.umich.edu/bioc/bin/macosx/leopard/contrib/2.15 Installing package(s) 'Biobase' Error: Line starting '<!DOCTYPE html PUBLI ...' is malformed! > traceback() 6: read.dcf

all possible wordform completions of a (biomedical) word's stem

阅读更多关于 all possible wordform completions of a (biomedical) word's stem

I'm familiar with word stemming and completion from the tm package in R. I'm trying to come up with a quick and dirty method for finding all variants of a given word (within some corpus.) For example, I'd like to get "leukocytes" and "leuckocytic" if my input is "leukocyte". If I had to do it right now, I would probably just go with something like: library(tm) library(RWeka) dictionary <- unique(unlist(lapply(crude, words))) grep(pattern = LovinsStemmer("company"), ignore.case = T, x = dictionary, value = T) I used Lovins because Snowball's Porter doesn't seem to be aggressive enough. I'm open

Reading .fasta sequences to extract nucleotide data, and then writing to a TabDelimited file

阅读更多关于 Reading .fasta sequences to extract nucleotide data, and then writing to a TabDelimited file

问题 Before I continue, I thought I'd refer readers to my previous problems with Perl, being a beginner to all of this. These were my posts over the past few days, in chronological order: How do I average column values from a tab-separated data... (Solved) Why do I see no computed results in my output file? (Solved) Using a .fasta file to compute relative content of sequences Now as I've stated above, thanks to help from a few of you, I've managed to figure out the first two queries and I've

Automatic multi-file download in R-Shiny

阅读更多关于 Automatic multi-file download in R-Shiny

I'm trying to figure out how to get a data.frame to subset itself and then write a .csv file for each subset. I'm writing a shiny app which will generate template files for different instruments, and I need to be able to get a file for each batch/plate/whatever. Obviously, we could do a manual sort, but that kind of defeats the purpose. In example, say that I have a data.frame with 4 columns named 1) PlateID, 2) SampleName, 3) Well and 4) Comments and I want to subset by the PlateID such that each individual plate will have it's own file. output$multiDownload <- renderText({ #templateData()

all possible wordform completions of a (biomedical) word's stem

阅读更多关于 all possible wordform completions of a (biomedical) word's stem

问题 I'm familiar with word stemming and completion from the tm package in R. I'm trying to come up with a quick and dirty method for finding all variants of a given word (within some corpus.) For example, I'd like to get "leukocytes" and "leuckocytic" if my input is "leukocyte". If I had to do it right now, I would probably just go with something like: library(tm) library(RWeka) dictionary <- unique(unlist(lapply(crude, words))) grep(pattern = LovinsStemmer("company"), ignore.case = T, x =

Automatic multi-file download in R-Shiny

阅读更多关于 Automatic multi-file download in R-Shiny

问题 I'm trying to figure out how to get a data.frame to subset itself and then write a .csv file for each subset. I'm writing a shiny app which will generate template files for different instruments, and I need to be able to get a file for each batch/plate/whatever. Obviously, we could do a manual sort, but that kind of defeats the purpose. In example, say that I have a data.frame with 4 columns named 1) PlateID, 2) SampleName, 3) Well and 4) Comments and I want to subset by the PlateID such that

Installing Bio::DB::Sam perl module

阅读更多关于 Installing Bio::DB::Sam perl module

I am trying to install a perl module Bio::DB::Sam on my home directory on a remote server. I downloaded the module, extracted the files, and ran: perl Build.pl prefix=~/local this is what happens next: This module requires samtools 0.1.10 or higher (samtools.sourceforge.net). Please enter the location of the bam.h and compiled libbam.a files: **/some_places/samtools-0.1.19** Found /some_places/samtools-0.1.19/bam.h and /some_places/samtools-0.1.19/libbam.a. Created MYMETA.yml and MYMETA.json Creating new 'Build' script for 'Bio-SamTools' version '1.39' Next when I try to run: ./Build this is

error with a function to retrieve data from a database

阅读更多关于 error with a function to retrieve data from a database

问题 I am trying to get a FASTA file form NCBI website, I use the following function getncbiseq <- function(accession){ dbs <- c() for (i in 1:numdbs){ db <- dbs[i] choosebank(db) resquery <- try(query(".tmpquery", paste("AC=", accession)),silent = TRUE) if (!(inherits(resquery, "try-error"))){ queryname <- "query2" thequery <- paste("AC=",accession,sep="") query(`queryname`,`thequery`) # see if a sequence was retrieved: seq <- getSequence(query2$req[[1]]) closebank() return(seq) } closebank() }

Installing Bio::DB::Sam perl module

阅读更多关于 Installing Bio::DB::Sam perl module

问题 I am trying to install a perl module Bio::DB::Sam on my home directory on a remote server. I downloaded the module, extracted the files, and ran: perl Build.pl prefix=~/local this is what happens next: This module requires samtools 0.1.10 or higher (samtools.sourceforge.net). Please enter the location of the bam.h and compiled libbam.a files: **/some_places/samtools-0.1.19** Found /some_places/samtools-0.1.19/bam.h and /some_places/samtools-0.1.19/libbam.a. Created MYMETA.yml and MYMETA.json