Join R data.tables where key values are not exactly equal--combine rows with closest times

前端 未结 2 982
谎友^
谎友^ 2020-11-30 06:57

Is there a slick way to join data tables in R where key values of time are close, but not exactly the same? For example, suppose I have a data table of results that are give

2条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-30 07:14

    You can use findInterval to accomplish this:

    setkey(DT2, time)
    DT1[, id := findInterval(DT1$time, DT2$time)]
    DT2[, id := 1:3]
    
    setkey(DT1, "x", "id")
    setkey(DT2, "x", "id")
    print(DT1[DT2][, id := NULL])
    #    x time v time.1
    # 1: a   30 2     17
    # 2: b   60 6     54
    # 3: c   10 7      3
    

    The idea: First sort the data.table by time because the second argument of findInterval requires increasing order of values. Now, use findInterval to find in which interval of 3, 17, 54 does the values in DT1$time fall and store it in id. In this particular case, it happens to range from 1 to 3. So, set these values as id column for DT2. Once you find the intervals and get id, then it's straightforward. Instead of setting x and time, set x and id as keys and do your merge.

    Note: Suppose your DT1$time had a value of 0, then, the interval for that would have been 0. So, you'd get 4 unique values (0:3). In that case, it may be better to have DT2 with a time = 0 value as well. I just wanted to note this point here. I'll leave it to you.

提交回复
热议问题