Referencing a dataframe recursively

旧时模样 提交于 2019-11-29 12:40:41

问题


Is there a way to have a dataframe refer to itself?

I find myself spending a lot of time writing things like y$Category1[is.na(y$Category1)]<-NULL which are hard to read and feel like a lot of slow repetitive typing. I wondered if there was something along the lines of:

y$Category1[is.na(self)] <- NULL I could use instead.

Thanks


回答1:


What a great question. Unfortunately, as @user295691 pointed out in the coments, the issue is with regards to referencing a vector twice: once as the object being indexed and once as the subject of a condition. It does appear impossible to avoid the double reference.

numericVector[cond(numericVector)] <- newVal

What I think we can do is have a nice and neat function so that instead of

 # this  
 y$Category1[is.na(y$Category1)] <- list(NULL)

 # we can have this: 
 NAtoNULL(y$Category1)

For example, the following functions wrap selfAssign() (below):

NAtoNULL(obj)      # Replaces NA values in obj with NULL.
NAtoVal(obj, val)  # Replaces NA values in obj with val.
selfReplace(obj, toReplace, val)  # Replaces toReplace values in obj with val

# and selfAssign can be called directly, but I'm not sure there would be a good reason to
selfAssign(obj, ind, val)  # equivalent to obj[ind] <- val

Example:

# sample df
df <- structure(list(subj=c("A",NA,"C","D","E",NA,"G"),temp=c(111L,112L,NA,114L,115L,116L,NA),size=c(0.7133,NA,0.7457,NA,0.0487,NA,0.8481)),.Names=c("subj","temp","size"),row.names=c(NA,-7L),class="data.frame")

df
  subj temp   size
1    A  111 0.7133
2 <NA>  112     NA
3    C   NA 0.7457
4    D  114     NA
5    E  115 0.0487
6 <NA>  116     NA
7    G   NA 0.8481

# Make some replacements
NAtoNULL(df$size)    # Replace all NA's in df$size wtih NULL's
NAtoVal(df$temp, 0)  # Replace all NA's in df$tmp wtih 0's
NAtoVal(df$subj, c("B", "E"))   # Replace all NA's in df$subj with alternating "B" and "E" 

# the modified df is now:  
df

  subj temp   size
1    A  111 0.7133
2    B  112   NULL
3    C    0 0.7457
4    D  114   NULL
5    E  115 0.0487
6    E  116   NULL
7    G    0 0.8481


# replace the 0's in temp for NA
selfReplace(df$temp, 0, NA)

# replace NULL's in size for 1's
selfReplace(df$size, NULL, 1)

# replace all "E"'s in subj with alternate c("E", "F")
selfReplace(df$subj, c("E"), c("E", "F"))

df

  subj temp   size
1    A  111 0.7133
2    B  112      1
3    C   NA 0.7457
4    D  114      1
5    E  115 0.0487
6    F  116      1
7    G   NA 0.8481

Right now this works for vectors, but will fail with *apply. I would love to get it working fully, especially with applying plyr. The key would be to modify


FUNCTIONS

The code for the functions are below.

An important point. This does not (yet!) work with *apply / plyr.
I believe it can by modifying the value of n and adjusting sys.parent(.) in match.call() but it still needs some fiddling. Any suggestions / modifications would be grealy appreciated

selfAssign <- function(self, ind, val, n=1, silent=FALSE) {
## assigns val to self[ind] in environment parent.frame(n)
## self should be a vector.  Currently will not work for matricies or data frames

  ## GRAB THE CORRECT MATCH CALL
  #--------------------------------------
      # if nested function, match.call appropriately
      if (class(match.call()) == "call") {
        mc <- (match.call(call=sys.call(sys.parent(1))))
      } else {
        mc <- match.call()
      }

      # needed in case self is complex (ie df$name)
      mc2 <- paste(as.expression(mc[[2]]))


  ## CLEAN UP ARGUMENT VALUES
  #--------------------------------------
      # replace logical indecies with numeric indecies
      if (is.logical(ind))
        ind <- which(ind) 

      # if no indecies will be selected, stop here
      if(identical(ind, integer(0)) || is.null(ind)) {
        if(!silent) warning("No indecies selected")
        return()
      }

      # if val is a string, we need to wrap it in quotes
      if (is.character(val))
        val <- paste('"', val, '"', sep="")

      # val cannot directly be NULL, must be list(NULL)
      if(is.null(val))
        val <- "list(NULL)"


  ## CREATE EXPRESSIONS AND EVAL THEM
  #--------------------------------------
     # create expressions to evaluate
     ret <- paste0("'[['(", mc2, ", ", ind, ") <- ", val)

     # evaluate in parent.frame(n)
     eval(parse(text=ret), envir=parent.frame(n))
}


NAtoNULL <- function(obj, n=1) {
  selfAssign(match.call()[[2]], is.na(obj), NULL, n=n+1)
}

NAtoVal <- function(obj, val, n=1) {
  selfAssign(match.call()[[2]], is.na(obj), val, n=n+1)  
}

selfReplace <- function(obj, toReplace, val, n=1) {
## replaces occurrences of toReplace within obj with val

  # determine ind based on value & length of toReplace
  # TODO:  this will not work properly for data frames, but neither will selfAssign, yet.
  if (is.null(toReplace)) {
    ind <- sapply(obj, function(x) is.null(x[[1]]))
  }  else if (is.na(toReplace)) {
    ind <- is.na(obj)
  } else  {
    if (length(obj) > 1) {    # note, this wont work for data frames
          ind <- obj %in% toReplace
    } else {
      ind <- obj == toReplace
    }
  } 

  selfAssign(match.call()[[2]], ind, val, n=n+1)  
}



  ## THIS SHOULD GO INSIDE NAtoNULL, NAtoVal etc. 

  # todo: modify for use with *apply
  if(substr(paste(as.expression(x1)), 1, 10) == "FUN(obj = ") {
      # PASS.  This should identify when the call is coming from *apply. 
      #  in such a case, need to increase n by 1 for apply & lapply.  Increase n by 2 for sapply      
      # I'm not sure the increase required for plyr functions
  }


来源:https://stackoverflow.com/questions/13615385/referencing-a-dataframe-recursively

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!