问题
I am thinking to get a list of ages by a list of ids by the following function getAges
.
It fails on the whole code example, see the following complete code, by returning ages in wrong order on the given id list.
The code DF[DF$ID %in% ids,]
takes the whole data (DF
), considers ids (DF$ID
), the former in the list of ids (- - %in% ids
), and returns age of those ids ([wantedIds]$Age
).
I am unsure about the part - - %in% ids
because R %in%
compares returns the id if there is a match.
getAges <- function(...)
{
DF[DF$ID %in% ids,]$Age
}
The function getIDs
returns correctly.
The whole code example
library('dplyr')
getIDs <- function(..., by = NULL){
DF %>% filter_(...) %>% { if (!is.null(by)) arrange_(., by) else . } %>% .$ID
}
getAges <- function(...)
{
DF[DF$ID %in% ids,]$Age
}
DF <- structure(list(ID = c(16265L, 16272L, 16273L, 16420L, 16483L,
16539L, 16773L, 16786L, 16795L, 17052L, 17453L, 18177L, 18184L,
19088L, 19090L, 19093L, 19140L, 19830L), Age = c(32L, 20L, 28L,
38L, 42L, 35L, 26L, 32L, 20L, 45L, 32L, 26L, 34L, 41L, 45L, 34L,
38L, 50L), Gender = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("female",
"male"), class = "factor")), .Names = c("ID", "Age", "Gender"
), class = "data.frame", row.names = c(NA, -18L))
ids <- getIDs(by = "desc(Age)")
ages <- getAges(ids) # TODO this fails
str(ids)
str(ages)
# int [1:18] 19830 17052 19090 16483 19088 16420 19140 16539 18184 19093 ...
# int [1:18] 32 20 28 38 42 35 26 32 20 45 ... # TODO why here this order?
Original data as a list
#Original
#ID Age Gender
#16265 32 male
#16272 20 female
#16273 28 female
#16420 38 female
#16483 42 male
#16539 35 female
#16773 26 male
#16786 32 female
#16795 20 female
#17052 45 female
#17453 32 female
#18177 26 female
#18184 34 female
#19088 41 female
#19090 45 male
#19093 34 male
#19140 38 female
#19830 50 female
Expected output of getAges
: list of ages corresponding to the order of the list ids
R: 3.3.2
OS: Debian 8.5
回答1:
If the only purpose of getAges
is to lookup the ages of ids
then try
getAges <- function(...)
{
DF[match(ids,DF$ID),"Age"]
}
回答2:
In dplyr
getAges <-
DF %>%
na.omit %>%
arrange(desc(Age),ID) %>%
select(Age)
getAges
Age
1 50
2 45
3 45
4 42
5 41
6 38
7 38
8 35
9 34
10 34
11 32
12 32
13 32
14 28
15 26
16 26
17 20
18 20
> as.list(getAges)
$Age
[1] 50 45 45 42 41 38 38 35 34 34 32 32 32 28 26 26 20 20
However, (though here I can only surmise) if you leave your data in a dataframe you will have a much easier time of it in your next step, too.
See here for a great introduction to that subject or if wideo is your thing, an excellent classic video from an R meetup is here. In viewing that it may be helpful to note that we now use his tidyr
functions which make the melting and recasting in reshape
even easier, and of course dplyr
has completely altered the way we do the manipulations to the dataframes: avoiding the base R $col
and []
based referencing.
回答3:
alexis_laz's proposal of comments for the improvement about ...
in the function
library(R6)
DF2 = mydataframe$new(DF)
mydataframe = R6Class("mydataframe",
public = list(
data = data.frame(ID = integer(),
Age = integer(),
gender = character()
),
initialize = function(x) {
stopifnot(c("ID", "Age", "Gender") %in% names(x)); self$data = x
},
getIDs = function(..., by = NULL) self$data %>% filter_(...) %>% {
if (!is.null(by)) arrange_(., by) else .
}
%>% .$ID,
getAges = function(ids = self$data$ID) self$data$Age[match(ids, self$data$ID)]
)#not sure if correct amount
)
# Use by
DF2$getIDs(by = "desc(Age)");
DF2$getAges();
DF2$getAges(DF2$getIDs(by = "desc(Age)"))
来源:https://stackoverflow.com/questions/41206377/why-this-r-dplyr-getages-fails-on-ordered-list