Subset data based on Minimum Value

问题

This might an easy one. Here's the data:

dat <- read.table(header=TRUE, text="
Seg  ID  Distance
Seg46      V21 160.37672
Seg72      V85 191.24400
Seg373      V85 167.38930
Seg159     V147  14.74852
Seg233     V171 193.01636
Seg234     V171 200.21458

                   ")
dat
Seg  ID  Distance
Seg46      V21 160.37672
Seg72      V85 191.24400
Seg373      V85 167.38930
Seg159     V147  14.74852
Seg233     V171 193.01636
Seg234     V171 200.21458

I am intending to get a table like the following that will give me Seg for the minimized distance (as duplication is seen in ID.

Seg Crash_ID  Distance
Seg46      V21 160.37672
Seg373      V85 167.38930
Seg159     V147  14.74852
Seg233     V171 193.01636

I am trying to use ddply to solve it; but it is not reaching there.

ddply(dat, "Seg", summarize, min = min(Distance))
Seg       min
Seg159  14.74852
Seg233 193.01636
Seg234 200.21458
Seg373 167.38930
Seg46 160.37672
Seg72 191.24400

回答1:

We can subset the rows with which.min. After grouping with 'ID', we slice the rows based on the position of minimum 'Distance'.

library(dplyr)
dat %>% 
   group_by(ID) %>% 
   slice(which.min(Distance))

A similar option using data.table would be

library(data.table)
setDT(dat)[, .SD[which.min(Distance)], by = ID]

回答2:

If you prefer ddply you could do this

library(plyr)
ddply(dat, .(ID), summarize, 
      Seg = Seg[which.min(Distance)], 
      Distance = min(Distance))

#    ID    Seg  Distance
#1 V147 Seg159  14.74852
#2 V171 Seg233 193.01636
#3  V21  Seg46 160.37672
#4  V85 Seg373 167.38930

来源：https://stackoverflow.com/questions/32377541/subset-data-based-on-minimum-value

标签

subset

dplyr

plyr