Why this R dplyr getAges fails on ordered list?

 ̄綄美尐妖づ 提交于 2019-12-23 04:57:16

问题


I am thinking to get a list of ages by a list of ids by the following function getAges. It fails on the whole code example, see the following complete code, by returning ages in wrong order on the given id list. The code DF[DF$ID %in% ids,] takes the whole data (DF), considers ids (DF$ID), the former in the list of ids (- - %in% ids), and returns age of those ids ([wantedIds]$Age). I am unsure about the part - - %in% ids because R %in% compares returns the id if there is a match.

getAges <- function(...)
{
   DF[DF$ID %in% ids,]$Age
}

The function getIDs returns correctly. The whole code example

library('dplyr')
getIDs <- function(..., by = NULL){
    DF %>% filter_(...) %>% { if (!is.null(by))  arrange_(., by) else . } %>% .$ID
} 
getAges <- function(...)
{
   DF[DF$ID %in% ids,]$Age
}

DF <- structure(list(ID = c(16265L, 16272L, 16273L, 16420L, 16483L, 
16539L, 16773L, 16786L, 16795L, 17052L, 17453L, 18177L, 18184L, 
19088L, 19090L, 19093L, 19140L, 19830L), Age = c(32L, 20L, 28L, 
38L, 42L, 35L, 26L, 32L, 20L, 45L, 32L, 26L, 34L, 41L, 45L, 34L, 
38L, 50L), Gender = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("female", 
"male"), class = "factor")), .Names = c("ID", "Age", "Gender"
), class = "data.frame", row.names = c(NA, -18L))

ids <- getIDs(by = "desc(Age)")

ages <- getAges(ids) # TODO this fails

str(ids)
str(ages)
#  int [1:18] 19830 17052 19090 16483 19088 16420 19140 16539 18184 19093 ...
# int [1:18] 32 20 28 38 42 35 26 32 20 45 ... # TODO why here this order?

Original data as a list

#Original
#ID Age Gender
#16265  32  male
#16272  20  female
#16273  28  female
#16420  38  female
#16483  42  male
#16539  35  female
#16773  26  male
#16786  32  female
#16795  20  female
#17052  45  female
#17453  32  female
#18177  26  female
#18184  34  female
#19088  41  female
#19090  45  male
#19093  34  male
#19140  38  female
#19830  50  female

Expected output of getAges: list of ages corresponding to the order of the list ids

R: 3.3.2
OS: Debian 8.5


回答1:


If the only purpose of getAges is to lookup the ages of ids then try

getAges <- function(...)
{
   DF[match(ids,DF$ID),"Age"]
}



回答2:


In dplyr

getAges <- 
    DF %>% 
    na.omit %>% 
    arrange(desc(Age),ID) %>% 
    select(Age)

getAges
   Age
1   50
2   45
3   45
4   42
5   41
6   38
7   38
8   35
9   34
10  34
11  32
12  32
13  32
14  28
15  26
16  26
17  20
18  20
> as.list(getAges)
$Age
 [1] 50 45 45 42 41 38 38 35 34 34 32 32 32 28 26 26 20 20

However, (though here I can only surmise) if you leave your data in a dataframe you will have a much easier time of it in your next step, too. See here for a great introduction to that subject or if wideo is your thing, an excellent classic video from an R meetup is here. In viewing that it may be helpful to note that we now use his tidyr functions which make the melting and recasting in reshape even easier, and of course dplyr has completely altered the way we do the manipulations to the dataframes: avoiding the base R $col and [] based referencing.




回答3:


alexis_laz's proposal of comments for the improvement about ... in the function

library(R6)

DF2 = mydataframe$new(DF)

mydataframe = R6Class("mydataframe", 
  public = list(
    data = data.frame(ID = integer(), 
    Age = integer(), 
    gender = character()
  ), 
  initialize = function(x) { 
    stopifnot(c("ID", "Age", "Gender") %in% names(x)); self$data = x 
  }, 
  getIDs = function(..., by = NULL) self$data %>% filter_(...) %>% { 
    if (!is.null(by)) arrange_(., by) else . 
  } 
  %>% .$ID, 
  getAges = function(ids = self$data$ID) self$data$Age[match(ids, self$data$ID)]
  )#not sure if correct amount
)

# Use by 
DF2$getIDs(by = "desc(Age)"); 
DF2$getAges(); 
DF2$getAges(DF2$getIDs(by = "desc(Age)"))


来源:https://stackoverflow.com/questions/41206377/why-this-r-dplyr-getages-fails-on-ordered-list

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!