add missed value based on the value of the column in r

后端未结

关注

 3  2199

This is my sample dataset:

   vector1 <-
      data.frame(
        \"name\" = \"a\",
        \"age\" = 10,
        \"fruit\" = c(\"orange\", \"cherry\", \"app


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2021-01-23 01:37
              
            
            
                                                                       
Consider base R methods --lapply, expand.grid, transform, rbind, aggregate-- that appends all possible fruit and tag options to each dataframe and keeps the max counts.

new_list <- lapply(list, function(df) {
  fruit_tag_df <- transform(expand.grid(fruit=c("apple", "cherry", "mango", "orange"),
                                        tag=c(1,2)),
                            name = df$name[1],
                            age = df$age[1],
                            count = 0)

  aggregate(.~name + age + fruit + tag, rbind(df, fruit_tag_df), FUN=max)
})


Output 

new_list

# [[1]]
#   name age  fruit tag count
# 1    a  10  apple   1     0
# 2    a  10 cherry   1     1
# 3    a  10 orange   1     1
# 4    a  10  mango   1     0
# 5    a  10  apple   2     1
# 6    a  10 cherry   2     0
# 7    a  10 orange   2     0
# 8    a  10  mango   2     0

# [[2]]
#   name age  fruit tag count
# 1    b  33  apple   1     0
# 2    b  33  mango   1     0
# 3    b  33 cherry   1     0
# 4    b  33 orange   1     0
# 5    b  33  apple   2     1
# 6    b  33  mango   2     1
# 7    b  33 cherry   2     0
# 8    b  33 orange   2     0

# [[3]]
#   name age  fruit tag count
# 1    c  58  apple   1     1
# 2    c  58 cherry   1     1
# 3    c  58  mango   1     0
# 4    c  58 orange   1     0
# 5    c  58  apple   2     0
# 6    c  58 cherry   2     0
# 7    c  58  mango   2     0
# 8    c  58 orange   2     0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2021-01-23 01:39
              
            
            
                                                                       
The OP has requested to complete each data.frame in list so that all combinations of default fruit and tags 1:2 will appear in the result whereby count should be set to 0 for the additional rows. Finally, each data.frame should consist at least of 4 x 2 = 8 rows.
I want to propose two different approaches:

Using lapply() and the CJ() (cross join) function from data.table to return a list.
Combine the separate data.frames in list to one large data.table using rbindlist() and apply the required transformations on the whole data.table.

Using lapply() and CJ()
library(data.table)
lapply(lst, function(x) setDT(x)[
  CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE), 
  on = .(name, age, fruit, tag)][
    is.na(count), count := 0][order(-count, tag)]
)


[[1]]
   name age  fruit count tag
1:    a  10 cherry     1   1
2:    a  10 orange     1   1
3:    a  10  apple     1   2
4:    a  10  apple     0   1
5:    a  10  mango     0   1
6:    a  10 cherry     0   2
7:    a  10  mango     0   2
8:    a  10 orange     0   2

[[2]]
   name age  fruit count tag
1:    b  33  apple     1   2
2:    b  33  mango     1   2
3:    b  33  apple     0   1
4:    b  33 cherry     0   1
5:    b  33  mango     0   1
6:    b  33 orange     0   1
7:    b  33 cherry     0   2
8:    b  33 orange     0   2

[[3]]
   name age  fruit count tag
1:    c  58  apple     1   1
2:    c  58 cherry     1   1
3:    c  58  mango     0   1
4:    c  58 orange     0   1
5:    c  58  apple     0   2
6:    c  58 cherry     0   2
7:    c  58  mango     0   2
8:    c  58 orange     0   2


Ordering by count and tag is not required but helps to compare the result with OP's expected output.
Creating on large data.table
Instead of a list of data.frames with identical structure we can use one large data.table where the origin of each row can be identified by an id column.
Indeed, th OP has asked other questions ("using lapply function and list in r"
 and "how to loop the dataframe using sqldf?" where he asked for help in handling a list of data.frames. G. Grothendieck already had suggested to rbind the rows together.
The rbindlist() function has the idcol parameter which identifies the origin of each row:
library(data.table)
rbindlist(list, idcol = "df")


   df name age  fruit count tag
1:  1    a  10 orange     1   1
2:  1    a  10 cherry     1   1
3:  1    a  10  apple     1   2
4:  2    b  33  apple     1   2
5:  2    b  33  mango     1   2
6:  3    c  58 cherry     1   1
7:  3    c  58  apple     1   1


Note that df contains the number of the source data.frame in list (or the names of the list elements if list is named).
Now, we can apply above solution by grouping over df:
rbindlist(list, idcol = "df")[, .SD[
  CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE), 
  on = .(name, age, fruit, tag)], by = df][
    is.na(count), count := 0][order(df, -count, tag)]


    df name age  fruit count tag
 1:  1    a  10 cherry     1   1
 2:  1    a  10 orange     1   1
 3:  1    a  10  apple     1   2
 4:  1    a  10  apple     0   1
 5:  1    a  10  mango     0   1
 6:  1    a  10 cherry     0   2
 7:  1    a  10  mango     0   2
 8:  1    a  10 orange     0   2
 9:  2    b  33  apple     1   2
10:  2    b  33  mango     1   2
11:  2    b  33  apple     0   1
12:  2    b  33 cherry     0   1
13:  2    b  33  mango     0   1
14:  2    b  33 orange     0   1
15:  2    b  33 cherry     0   2
16:  2    b  33 orange     0   2
17:  3    c  58  apple     1   1
18:  3    c  58 cherry     1   1
19:  3    c  58  mango     0   1
20:  3    c  58 orange     0   1
21:  3    c  58  apple     0   2
22:  3    c  58 cherry     0   2
23:  3    c  58  mango     0   2
24:  3    c  58 orange     0   2
    df name age  fruit count tag


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2021-01-23 01:44
              
            
            
                                                                       
A solution using dplyr and tidyr. We can use complete to expand the data frame and specify the fill values as 0 to count.

Notice that I changed your list name from list to fruit_list because it is a bad practice to use reserved words in R to name an object. Also notice that when I created the example data frame I set stringsAsFactors = FALSE because I don't want to create factor columns. Finally, I used lapply instead of for-loop to loop through the list elements. 

library(dplyr)
library(tidyr)

fruit_list2 <- lapply(fruit_list, function(x){
  x2 <- x %>%
    complete(name, age, fruit = default, tag = c(1, 2), fill = list(count = 0)) %>%
    select(name, age, fruit, count, tag) %>%
    arrange(tag, fruit) %>%
    as.data.frame()
  return(x2)
})

fruit_list2
# [[1]]
#   name age  fruit count tag
# 1    a  10  apple     0   1
# 2    a  10 cherry     1   1
# 3    a  10  mango     0   1
# 4    a  10 orange     1   1
# 5    a  10  apple     1   2
# 6    a  10 cherry     0   2
# 7    a  10  mango     0   2
# 8    a  10 orange     0   2
# 
# [[2]]
#   name age  fruit count tag
# 1    b  33  apple     0   1
# 2    b  33 cherry     0   1
# 3    b  33  mango     0   1
# 4    b  33 orange     0   1
# 5    b  33  apple     1   2
# 6    b  33 cherry     0   2
# 7    b  33  mango     1   2
# 8    b  33 orange     0   2
# 
# [[3]]
#   name age  fruit count tag
# 1    c  58  apple     1   1
# 2    c  58 cherry     1   1
# 3    c  58  mango     0   1
# 4    c  58 orange     0   1
# 5    c  58  apple     0   2
# 6    c  58 cherry     0   2
# 7    c  58  mango     0   2
# 8    c  58 orange     0   2


DATA

vector1 <-
  data.frame(
    "name" = "a",
    "age" = 10,
    "fruit" = c("orange", "cherry", "apple"),
    "count" = c(1, 1, 1),
    "tag" = c(1, 1, 2),
    stringsAsFactors = FALSE
  )
vector2 <-
  data.frame(
    "name" = "b",
    "age" = 33,
    "fruit" = c("apple", "mango"),
    "count" = c(1, 1),
    "tag" = c(2, 2),
    stringsAsFactors = FALSE
  )
vector3 <-
  data.frame(
    "name" = "c",
    "age" = 58,
    "fruit" = c("cherry", "apple"),
    "count" = c(1, 1),
    "tag" = c(1, 1),
    stringsAsFactors = FALSE
  )

fruit_list <- list(vector1, vector2, vector3)

default <- c("cherry", "orange", "apple", "mango")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复