问题
For subsequent discussion, I will refer to the example data frame below:
Now, what I wish to achieve is to group all the packet times that are similar - i.e. all the 7s, 12s, etc. Furthermore, the PacketTime
field should contain the difference in min and max (max(PacketTime) - min(PacketTime)
), and the FrameLen
, IPLen
and TCPLen
fields should be lists of all the values that correspond to the grouped time. For example for the 7s group, FrameLen
would contain c(304, 276, 276)
.
My solution for the above is as follows:
df <- packets %>%
group_by(round(PacketTime)) %>%
summarise(
PTime=max(PacketTime)-min(PacketTime),
FLen=list(FrameLen),
ILen=list(IPLen),
Movement=0
) %>%
rename(PacketTime=PTime) %>%
rename(FrameLen=FLen) %>%
rename(IPLen=ILen)
df$"round(PacketTime)" <- NULL # Remove the group_by
However, some of these crossover (i.e. 1480s also includes part of 1481s). The part here, which makes this a little easier (in some regard) is that each of the groups are separated by 5s timing window (via Python time.sleep(5)
).
How can I achieve the previous result, but only relying on the 5s difference between the groups that also takes into account the crossover?
EDIT: As suggested by Ben, here is the dput()
of my dataframe df[1:20,]
:
structure(list(PacketTime = c(7.083779, 7.147268, 7.147462, 12.084768,
12.153246, 12.153951, 17.095972, 17.159268, 17.159876, 22.11384,
22.176926, 22.177467, 27.134427, 27.199108, 27.200064, 32.144442,
32.208648, 32.20922, 37.144255, 37.205622), FrameLen = c(304L,
276L, 276L, 304L, 276L, 276L, 304L, 276L, 276L, 304L, 276L, 276L,
304L, 276L, 276L, 304L, 276L, 276L, 304L, 276L), IPLen = c(300L,
272L, 272L, 300L, 272L, 272L, 300L, 272L, 272L, 300L, 272L, 272L,
300L, 272L, 272L, 300L, 272L, 272L, 300L, 272L), TCPLen = c(260L,
232L, 232L, 260L, 232L, 232L, 260L, 232L, 232L, 260L, 232L, 232L,
260L, 232L, 232L, 260L, 232L, 232L, 260L, 232L), Movement = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA,
20L), class = "data.frame")
回答1:
One approach is to use seq
and cut
. Create a sequence from your minimum to maximum times, every 5 seconds. Then, use cut
to put your times in intervals. You can use the interval for the labels, for example: (7-12 sec) by omitting the labels
argument. Or just use the lower time of the interval (7 sec) as done below.
library(tidyverse)
my_breaks <- seq(trunc(min(packets$PacketTime)), max(packets$PacketTime) + 5, 5)
packets$Interval <- cut(packets$PacketTime, breaks = my_breaks, labels = my_breaks[-length(my_breaks)], right = FALSE)
packets %>%
group_by(Interval) %>%
summarise(
PTime=max(PacketTime)-min(PacketTime),
FLen=list(FrameLen),
ILen=list(IPLen),
Movement=0
) %>%
rename(PacketTime=PTime) %>%
rename(FrameLen=FLen) %>%
rename(IPLen=ILen)
Output
# A tibble: 7 x 5
Interval PacketTime FrameLen IPLen Movement
<fct> <dbl> <list> <list> <dbl>
1 7 0.0637 <int [3]> <int [3]> 0
2 12 0.0692 <int [3]> <int [3]> 0
3 17 0.0639 <int [3]> <int [3]> 0
4 22 0.0636 <int [3]> <int [3]> 0
5 27 0.0656 <int [3]> <int [3]> 0
6 32 0.0648 <int [3]> <int [3]> 0
7 37 0.0614 <int [2]> <int [2]> 0
回答2:
Here is a base R solution using aggregate
+ transform
u <- aggregate(
. ~ PacketTime,
transform(df,
PTime = ave(PacketTime, trunc(PacketTime),
FUN = function(x) diff(range(x))), PacketTime = trunc(PacketTime)
),
c
)
dfout <- transform(u, PTime = sapply(PTime, unique))
which gives
> dfout
PacketTime FrameLen IPLen TCPLen Movement PTime
1 7 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063683
2 12 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.069183
3 17 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063904
4 22 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.063627
5 27 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.065637
6 32 304, 276, 276 300, 272, 272 260, 232, 232 0, 0, 0 0.064778
7 37 304, 276 300, 272 260, 232 0, 0 0.061367
来源:https://stackoverflow.com/questions/61715025/concatenating-data-frame-rows-based-on-column-condition