How to build an alphabetical tree from a list of words in R?

别等时光非礼了梦想. 提交于 2021-01-28 10:56:22

问题


My problem is simple. I have a long list of words, e.g. abbey, abbot, abbr, abide.

I would like to build a tree as follows:

Level 0                             A
                                    | 
Level 1                             B
                                  /   \
Level 2                         B       I
                              / | \     |
Level 3                     E   O   R   D   
                            |   |       |
Level 4                     Y   T       E

Is there an easy way to parse the wordlist and create such a structure in R?

Thanks a lot for your help!

Sincerely, Chris


回答1:


Here's an igraph-based solution that labels each node of the graph with the partial word, so that terminal nodes are named with full words:

library(igraph)
library(stringr)

initgraph = function(){
    # create a graph with one empty-named node and no edges
    g=graph.empty(n=1)
    V(g)$name=""
    g
}


wordtree <- function(g=initgraph(),wordlist){
    for(word in wordlist){
        # turns "word" into c("w","wo","wor","word")
        subwords = str_sub(word, 1, 1:nchar(word))
        # make a graph long enough to hold all those sub-words plus start node
        subg = graph.lattice(length(subwords)+1,directed=TRUE)
        # set vertex nodes to start node plus sub-words
        V(subg)$name=c("",subwords)
        # merge *by name* into the existing graph
        g = graph.union(g, subg)
    }
    g
}

With that loaded,

g = wordtree(initgraph(), c("abbey","abbot","abbr","abide"))
plot(g)

gets

word tree

You can add words to an existing tree by passing it in as first parameter:

> g = wordtree(g,c("now","accept","answer","please"))
> plot(g)

The tree is always rooted at the node with name "" and all terminal nodes (those with no outgoing edges) have words. There's functions in igraph to pull those out if you need them. You haven't actually said what you want to do with this when you've done it... Or when we've done it for you :)

Note there is a nice layout for plotting trees which looks like your ascii example:

plot(g,layout=layout.reingold.tilford)

tree layout




回答2:


Here is a solution that builds a nested list recursively, with characters as names:

x <- c("abb", "abbey", "abbot", "abbr", "abide")

char.tree <- function(words, end = NULL) {
   first <- substr(words, 1, 1)
   rest  <- substr(words, 2, nchar(words))
   zi    <- nchar(words) == 0L 
   c(list(end)[any(zi)],
     lapply(split(rest[!zi], first[!zi]), char.tree, end = end))
}

str(char.tree(x))
# List of 1
#  $ a:List of 1
#   ..$ b:List of 2
#   .. ..$ b:List of 4
#   .. .. ..$  : NULL
#   .. .. ..$ e:List of 1
#   .. .. .. ..$ y:List of 1
#   .. .. .. .. ..$ : NULL
#   .. .. ..$ o:List of 1
#   .. .. .. ..$ t:List of 1
#   .. .. .. .. ..$ : NULL
#   .. .. ..$ r:List of 1
#   .. .. .. ..$ : NULL
#   .. ..$ i:List of 1
#   .. .. ..$ d:List of 1
#   .. .. .. ..$ e:List of 1
#   .. .. .. .. ..$ : NULL


来源:https://stackoverflow.com/questions/27060453/how-to-build-an-alphabetical-tree-from-a-list-of-words-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!