问题
My problem is simple. I have a long list of words, e.g. abbey, abbot, abbr, abide.
I would like to build a tree as follows:
Level 0 A | Level 1 B / \ Level 2 B I / | \ | Level 3 E O R D | | | Level 4 Y T E
Is there an easy way to parse the wordlist and create such a structure in R?
Thanks a lot for your help!
Sincerely, Chris
回答1:
Here's an igraph
-based solution that labels each node of the graph with the partial word, so that terminal nodes are named with full words:
library(igraph)
library(stringr)
initgraph = function(){
# create a graph with one empty-named node and no edges
g=graph.empty(n=1)
V(g)$name=""
g
}
wordtree <- function(g=initgraph(),wordlist){
for(word in wordlist){
# turns "word" into c("w","wo","wor","word")
subwords = str_sub(word, 1, 1:nchar(word))
# make a graph long enough to hold all those sub-words plus start node
subg = graph.lattice(length(subwords)+1,directed=TRUE)
# set vertex nodes to start node plus sub-words
V(subg)$name=c("",subwords)
# merge *by name* into the existing graph
g = graph.union(g, subg)
}
g
}
With that loaded,
g = wordtree(initgraph(), c("abbey","abbot","abbr","abide"))
plot(g)
gets
You can add words to an existing tree by passing it in as first parameter:
> g = wordtree(g,c("now","accept","answer","please"))
> plot(g)
The tree is always rooted at the node with name "" and all terminal nodes (those with no outgoing edges) have words. There's functions in igraph
to pull those out if you need them. You haven't actually said what you want to do with this when you've done it... Or when we've done it for you :)
Note there is a nice layout for plotting trees which looks like your ascii example:
plot(g,layout=layout.reingold.tilford)
回答2:
Here is a solution that builds a nested list recursively, with characters as names:
x <- c("abb", "abbey", "abbot", "abbr", "abide")
char.tree <- function(words, end = NULL) {
first <- substr(words, 1, 1)
rest <- substr(words, 2, nchar(words))
zi <- nchar(words) == 0L
c(list(end)[any(zi)],
lapply(split(rest[!zi], first[!zi]), char.tree, end = end))
}
str(char.tree(x))
# List of 1
# $ a:List of 1
# ..$ b:List of 2
# .. ..$ b:List of 4
# .. .. ..$ : NULL
# .. .. ..$ e:List of 1
# .. .. .. ..$ y:List of 1
# .. .. .. .. ..$ : NULL
# .. .. ..$ o:List of 1
# .. .. .. ..$ t:List of 1
# .. .. .. .. ..$ : NULL
# .. .. ..$ r:List of 1
# .. .. .. ..$ : NULL
# .. ..$ i:List of 1
# .. .. ..$ d:List of 1
# .. .. .. ..$ e:List of 1
# .. .. .. .. ..$ : NULL
来源:https://stackoverflow.com/questions/27060453/how-to-build-an-alphabetical-tree-from-a-list-of-words-in-r