subsetting from an R object of Gene IDs from GRanges file

问题

I have a GRanges file called "P.obj" where I want to extract/subset specific Gene IDs contained in the column "name". The specific Gene IDs that I want to extract are contained in the R object "plus" where the column name is also called "name" I understand how to subset by overlaps and find overlaps, but I cannot work out how to subset by gene name.

> P.obj
GRangesList of length 4:
$exons
GRanges with 604591 ranges and 2 metadata columns:
           seqnames               ranges strand   |     score            name
              <Rle>            <IRanges>  <Rle>   | <integer>     <character>
       [1]     chr1 [66999066, 66999090]      +   |         1 ENST00000237247
       [2]     chr1 [66999929, 67000051]      +   |         2 ENST00000237247
       [3]     chr1 [67091530, 67091593]      +   |         3 ENST00000237247
       [4]     chr1 [67098753, 67098777]      +   |         4 ENST00000237247
       [5]     chr1 [67099763, 67099846]      +   |         5 ENST00000237247
       ...      ...                  ...    ... ...       ...             ...
  [604587]    chr22 [51227323, 51227600]      +   |         4 ENST00000423888
  [604588]    chr22 [51222290, 51222500]      +   |         1 ENST00000480246
  [604589]    chr22 [51223601, 51223721]      +   |         2 ENST00000480246
  [604590]    chr22 [51237083, 51239737]      +   |         3 ENST00000480246
  [604591]    chr22 [51237083, 51237551]      +   |         1 ENST00000427528

...
<3 more elements>
---
seqlengths:
  chr1  chr2  chr3  chr4  chr5  chr6 ... chr17 chr18 chr19 chr20 chr21 chr22
    NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

> plus
             name
1 ENST00000237247
3 ENST00000480246
5 ENST00000427528

I have tried: P.obj[P.obj$name==plus$name]

But I get an error message: Warning message: In is.na(e1) : is.na() applied to non-(list or vector) of type 'NULL'

回答1:

The information you want is in the GRanges 'metadata' column, accessible with either mcols() or $. Also, you're looking for set membership %in%, rather than identity. So

P.obj[P.obj$name %in% plus$name]

Consider asking questions about Bioconductor packages on the Bioconductor support site.

来源：https://stackoverflow.com/questions/26193907/subsetting-from-an-r-object-of-gene-ids-from-granges-file

标签

subset