sequence by reducing data.table

前端 未结 1 823
离开以前
离开以前 2021-01-13 14:08
require(data.table)    
set.seed(333)
t <- data.table(old=1002:2001, dif=sample(1:10,1000, replace=TRUE))
t$new <- t$old + t$dif; t$foo <- rnorm(1000); t$di         


        
相关标签:
1条回答
  • 2021-01-13 14:46

    It seems to me that you need help from graph algorithms. If you want to start with 1002, you can try:

    require(igraph)
    g <- graph_from_edgelist(as.matrix(t[,1:2]))
    t[old %in% subcomponent(g,"1002","out")]
    #  1: 1002 1007 -0.78895338
    #  2: 1007 1015  1.13979100
    #  3: 1015 1022 -1.21936662
    #  4: 1022 1024  1.20390482
    #  5: 1024 1026  0.43885860
    # ---                      
    #191: 1981 1988 -0.22054875
    #192: 1988 1989 -0.22812175
    #193: 1989 1995 -0.04687776
    #194: 1995 2000  2.41349730
    #195: 2000 2002 -1.23425666
    

    Of course you can do the above for each start you want and limiting the results for the first n rows. For instance, we can lapply over the i$start positions and then rbindlist all the values together, declaring an id column with the i$id values. Something like:

    n <- 5
    rbindlist(
        setNames(lapply(i$start, function(x) t[old %in% subcomponent(g,x,"out")[1:n]]), i$id),
        idcol="id")
    #    id  old  new        foo
    # 1:  1 1002 1007 -0.7889534
    # 2:  1 1007 1015  1.1397910
    # 3:  1 1015 1022 -1.2193666
    # 4:  1 1022 1024  1.2039048
    # 5:  1 1024 1026  0.4388586
    # 6:  2 1744 1750 -0.1368320
    # 7:  2 1750 1758  0.3331686
    # 8:  2 1758 1763  1.3040357
    # 9:  2 1763 1767 -1.1715528
    #10:  2 1767 1775  0.2841251
    #11:  3 1656 1659 -0.1556208
    #12:  3 1659 1663  0.1663042
    #13:  3 1663 1669  0.3781835
    #14:  3 1669 1670  0.2760948
    #15:  3 1670 1675  0.3745026
    
    0 讨论(0)
提交回复
热议问题