r - hierarchical data frame from child/parent relations

懵懂的女人 提交于 2019-12-05 04:52:10

Usind data.table you can do the following:

require(data.table)
l <- list() # initialize empty list
setDT(dat) 
setkey(dat, parent) # setting up the data as keyed data.table
current_lvl <- dat[is.na(parent), .(level_number = 1), keyby=.(level1 = name)]

By not current_lvl looks as follows (keyed by level1)

   level1 level_number
1:    air            1
2:   land            1
3:  water            1

The trick is now to join dat and current_lvl and modify the result appropriately:

  current_lvl <- current_lvl[dat][ # Join the data.tables
!is.na(level_number)][ #exclude non-child-rows
  ,level_number := level_number + 1] # increment level_number
setnames(current_lvl, "name", paste0("level",ind+1)) # rename column
setkeyv(current_lvl, paste0("level",ind+1)) # set key

Which gives you (keyed by level2)

   level1 level_number     level2
1:    air            2   airplane
2:    air            2    balloon
3:   land            2    bicycle
4:  water            2       boat
5:   land            2        car
6:    air            2 helicopter

Put this to work in a while-loop as follows:

while(nrow(current_lvl) > 0){
  ind <- length(l) + 1
  l[[ind]] <- current_lvl
  current_lvl <- current_lvl[dat][!is.na(level_number)][,level_number := level_number + 1]
  if(nrow(current_lvl) == 0L){
    break
  }
  setnames(current_lvl, "name", paste0("level",ind+1))
  setkeyv(current_lvl, paste0("level",ind+1))
}

You can have a look at l to see the outcome. Combining this via rbindlist gives you what you desire

res <- rbindlist(l, fill=TRUE)
setcolorder(res, sort(names(res)))
res

what results in

> res
    level_number level1     level2 level3
 1:            1    air         NA     NA
 2:            1   land         NA     NA
 3:            1  water         NA     NA
 4:            2    air   airplane     NA
 5:            2    air    balloon     NA
 6:            2   land    bicycle     NA
 7:            2  water       boat     NA
 8:            2   land        car     NA
 9:            2    air helicopter     NA
10:            3    air   airplane Airbus
11:            3   land        car    BMW
12:            3   land        car   Ford

Using the data.tree package, you could do the following:

library(data.tree)
df <- data.frame(name = c("land", "water", "air", "car", "bicycle", "boat", "balloon", "airplane", "helicopter", "Ford", "BMW", "Airbus"), 
                 parent = c("root", "root", "root", "land", "land", "water", "air", "air", "air", "car", "car", "airplane"))

Note that I replaced the NAs with "root", which makes the conversion to a data.tree much easier. Namely:

tree <- FromDataFrameNetwork(df)

Getting the required format then becomes trivial as we can use the hierarchy infrastructure from data.tree:

ToDataFrameTree(tree, 
                level1 = function(x) x$path[2],
                level2 = function(x) x$path[3],
                level3 = function(x) x$path[4],
                level_number = function(x) x$level - 1)[-1,-1]
Sam

Do not use "root" as parent-value for toplevel-records. The solution using the data.tree-package is great, however, in newer versions "root" is a reserved name for nodes. Altough it is automatically replaced with "root2", the call to FromDataFrameNetwork(df) does not return a tree as wanted.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!