Count every possible pair of values in a column grouped by multiple columns

前端 未结 7 2027
不知归路
不知归路 2020-12-03 15:59

I have a dataframe that looks like this (this is just a subset, actually dataset has 2724098 rows)

> head(dat)

chr   start  end    enhancer motif 
chr10          


        
7条回答
  •  星月不相逢
    2020-12-03 16:38

    You might benefit from formally modelling the semantics of your data. If you have ranges on the genome, use the GenomicRanges package from Bioconductor.

    library(GenomicRanges)
    gr <- makeGRangesFromDataFrame(df, keep.extra.columns=TRUE)
    

    This is a GRanges object, which formally understands the notion of genomic location, so these operations just work:

    hits <- findMatches(gr, gr)
    tab <- table(motif1=gr$motif[queryHits(hits)],
                 motif2=gr$motif[subjectHits(hits)])
    subset(as.data.frame(tab, responseName="count"), motif1 != motif2)
    

提交回复
热议问题