biomart

Query genes within regions

删除回忆录丶 提交于 2021-02-10 16:15:23
问题 I want to retrieve the genes that are present within a series of regions. Say, I have a bed file with query positions such like: 1 2665697 4665777 MIR201 1 10391435 12391516 MIR500 1 15106831 17106911 MIR122 1 23436535 25436616 MIR234 1 23436575 25436656 MIR488 I would like to get the genes that fall within those regions. I have tried using biomaRt , and bedtools intersect, but the output I get, is a list of genes corresponding to all the regions, not one by one, as the desired output I would

Unable to use biomaRt package to get Gene Symbols from Entrez IDs

喜欢而已 提交于 2021-01-27 19:06:28
问题 I am using the following code to retrieve Gene Symbols from Entrez IDs: library("biomaRt") ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "www.ensembl.org") g <- getBM(c("hgnc_symbol"), filters = "entrezgene", c(entrez), ensembl) but I get the following error: Error in value[[3L]](cond): Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down. Traceback: 1. getBM(c

converting from Ensembl gene ID's to different identifier

房东的猫 提交于 2020-01-30 09:17:08
问题 I've inherited a dataset of RNAseq output data from Canis Lupus (dog). I have the gene identifier in the Ensembl format, specifically they look like this, ENSCAFT00000001452.3. I am trying to use bioMaRt to convert them to a more common ID and need help. I am very novice to R and consider myself rather ignorant. Any help to get started. Can these Ensembl ID's be converted to any other Ensembl ID (eg. different species)? Can these Ensembl ID's be converted to RefSeq, GI assesscion #? How

Remove part of string after “.”

…衆ロ難τιáo~ 提交于 2019-12-16 21:30:10
问题 I am working with NCBI Reference Sequence accession numbers like variable a : a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") To get information from the biomart package I need to remove the .1 , .2 etc. after the accession numbers. I normally do this with this code: b <- sub("..*", "", a) # [1] "" "" "" "" "" "" But as you can see, this isn't the correct way for this variable. Can anyone help me with this? 回答1: You just need to escape the

Issue with lapply using biomart

丶灬走出姿态 提交于 2019-12-11 09:19:14
问题 I am trying to use lapply to change the species name when extracting all the human genes. I'm still learning how to use lapply, I cant work out what I'm doing wrong. So far I have: library(biomaRt) I create the marts: ensembl_hsapiens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") ensembl_mmusculus <- useMart("ensembl", dataset = "mmusculus_gene_ensembl") ensembl_ggallus <- useMart("ensembl", dataset = "ggallus_gene_ensembl") Set the species: species <- c("hsapiens", "mmusculus",

Using spread with duplicate identifiers for rows giving error

别等时光非礼了梦想. 提交于 2019-12-04 05:40:17
问题 My data looks like this: df <- read.table(header = T, text = "GeneID Gene_Name Species Paralogues Domains Functional_Diversity 1234 DDR1 hsapiens 14 2 8.597482 5678 CSNK1E celegans 70 4 8.154788 9104 FGF1 Chicken 3 0 5.455874 4575 FGF1 hsapiens 4 6 6.745845") I need it to look like: Gene_Name hsapiens celegans ggalus DDR1 8.597482 NA NA CSNK1E NA 8.154788 NA FGF1 6.745845 NA 5.455874 I've tried using: library(tidyverse) df %>% select(Gene_Name, Species, Functional_Diversity) %>% spread

BioMart: Is there a way to easily change the species for all of my code?

落爺英雄遲暮 提交于 2019-11-29 16:56:56
Below is a small fraction of my code: library(biomaRt) ensembl_hsapiens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") hsapien_PC_genes <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), filters = "biotype", values = "protein_coding", mart = ensembl_hsapiens) paralogues[["hsapiens"]] <- getBM(attributes = c("external_gene_name", "hsapiens_paralog_associated_gene_name"), filters = "ensembl_gene_id", values = c(ensembl_gene_ID) , mart = ensembl_hsapiens) This bit of code will only allow me to extract the paralogues for hsapiens, it there a way for me to easily get the

BioMart: Is there a way to easily change the species for all of my code?

时间秒杀一切 提交于 2019-11-28 11:18:48
问题 Below is a small fraction of my code: library(biomaRt) ensembl_hsapiens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") hsapien_PC_genes <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), filters = "biotype", values = "protein_coding", mart = ensembl_hsapiens) paralogues[["hsapiens"]] <- getBM(attributes = c("external_gene_name", "hsapiens_paralog_associated_gene_name"), filters = "ensembl_gene_id", values = c(ensembl_gene_ID) , mart = ensembl_hsapiens) This bit of

Remove part of string after “.”

谁说我不能喝 提交于 2019-11-25 19:42:41
I am working with NCBI Reference Sequence accession numbers like variable a : a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") To get information from the biomart package I need to remove the .1 , .2 etc. after the accession numbers. I normally do this with this code: b <- sub("..*", "", a) # [1] "" "" "" "" "" "" But as you can see, this isn't the correct way for this variable. Can anyone help me with this? You just need to escape the period: a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2") gsub("\\.