What would be a sensible way to implement a Trie in .NET?

北城余情 提交于 2019-12-04 04:01:11

Well, you need each node to have something which effectively implements IDictionary<char, Trie>. You could write your own custom implementation which varies its internal structure based on how many subnodes it has:

  • For a single subnode, use just a char and a Trie
  • For a small number, use a List<Tuple<char, Trie>> or a LinkedList<Tuple<char,Trie>>
  • For a large number, use a Dictionary<char, Trie>

(Having just seen leppie's answer, this is the kind of hybrid approach he talks about, I believe.)

If your characters are from a limited set (e.g. only uppercase latin alphabet), then you can store a 26 element array, and each lookup is just

Trie next = store[c-'A']

where c is the current lookup character.

Implementing it as a Dictionary, in my mind, isn't implementing a Trie - that's implementing a Dictionary of Dictionaries.

When I've implemented a trie I've done it the same way as suggested by Damien_The_Unbeliever (+1 there):

public class TrieNode
{
  TrieNode[] Children = new TrieNode[no_of_chars];
}

This ideally requires then that your trie will only support a limited subset of characters indicated by no_of_chars and that you can map input characters to output indices. E.g. if supporting A-Z then you would naturally map A to 0 and Z to 25.

When you then need to add/remove/check existence of a node you then do something like this:

public TrieNode GetNode(char c)
{
  //mapping function - could be a lookup table, or simple arithmetic
  int index = GetIndex(c);
  //TODO: deal with the situation where 'c' is not supported by the map
  return Children[index];
} 

In real cases I've seen this optimized so that AddNode, for example, would take a ref TrieNode so that the node can be newed on demand and automatically placed into the parent TrieNode's Children in the correct place.

You could also use a Ternary Search Tree instead as the memory overhead for a trie can be pretty crazy (especially if you intend to support all 32k of unicode characters!) and the TST performance is rather impressive (and also supports prefix & wildcard searching as well as hamming searches). Equally, TSTs can natively support all unicode characters without having to do any mapping; since they work on a greater-than/less-than/equals operation instead of an absolute index value.

I took the code from here and adapted it slightly (it was written before generics).

I think you'll be pleasantly surprised by TSTs; once I had one implemented I steered away from Tries altogether.

The only tricky thing is keeeping the TST balanced; an issue you don't have with Tries.

There are a few ways, but using a singly link list is probably the simplest and lightweight.

I would do some tests to see the amount of child nodes each node has. If not much (say 20 or less), the link list approach should be faster than a hashtable. You could also do a hybrid approach depending on the amount of child nodes.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!