问题
I have a GRanges file called "P.obj" where I want to extract/subset specific Gene IDs contained in the column "name". The specific Gene IDs that I want to extract are contained in the R object "plus" where the column name is also called "name" I understand how to subset by overlaps and find overlaps, but I cannot work out how to subset by gene name.
> P.obj
GRangesList of length 4:
$exons
GRanges with 604591 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr1 [66999066, 66999090] + | 1 ENST00000237247
[2] chr1 [66999929, 67000051] + | 2 ENST00000237247
[3] chr1 [67091530, 67091593] + | 3 ENST00000237247
[4] chr1 [67098753, 67098777] + | 4 ENST00000237247
[5] chr1 [67099763, 67099846] + | 5 ENST00000237247
... ... ... ... ... ... ...
[604587] chr22 [51227323, 51227600] + | 4 ENST00000423888
[604588] chr22 [51222290, 51222500] + | 1 ENST00000480246
[604589] chr22 [51223601, 51223721] + | 2 ENST00000480246
[604590] chr22 [51237083, 51239737] + | 3 ENST00000480246
[604591] chr22 [51237083, 51237551] + | 1 ENST00000427528
...
<3 more elements>
---
seqlengths:
chr1 chr2 chr3 chr4 chr5 chr6 ... chr17 chr18 chr19 chr20 chr21 chr22
NA NA NA NA NA NA ... NA NA NA NA NA NA
> plus
name
1 ENST00000237247
3 ENST00000480246
5 ENST00000427528
I have tried: P.obj[P.obj$name==plus$name]
But I get an error message: Warning message: In is.na(e1) : is.na() applied to non-(list or vector) of type 'NULL'
回答1:
The information you want is in the GRanges
'metadata' column, accessible with either mcols()
or $
. Also, you're looking for set membership %in%
, rather than identity. So
P.obj[P.obj$name %in% plus$name]
Consider asking questions about Bioconductor packages on the Bioconductor support site.
来源:https://stackoverflow.com/questions/26193907/subsetting-from-an-r-object-of-gene-ids-from-granges-file