Case Insensitive Ternary Search Tree

淺唱寂寞╮ 提交于 2019-12-10 10:24:30

问题


I had been using Ternary Search Tree for a while, as the data structure to implement a auto complete drop down combo box. Which means, when user type "fo", the drop down combo box will display

foo food football

The problem is, my current used of Ternary Search Tree is case sensitive. My implementation is as follow. It had been used by real world for around 1++ yeas. Hence, I consider it as quite reliable.

My Ternary Search Tree code

However, I am looking for a case insensitive Ternary Search Tree, which means, when I type "fo", the drop down combo box will show me

foO Food fooTBall

Here are some key interface for TST, where I hope the new case insentive TST may have similar interface too.

    /** 
 * Stores value in the TernarySearchTree. The value may be retrieved using key.
 * @param key A string that indexes the object to be stored.
 * @param value The object to be stored in the tree.
 */    
public void put(String key, E value) {
    getOrCreateNode(key).data = value;
}

/**
 * Retrieve the object indexed by key.
 * @param key A String index.
 * @return Object The object retrieved from the TernarySearchTree.
 */
public E get(String key) {
    TSTNode<E> node = getNode(key);
    if(node==null) return null;
    return node.data;
}

An example of usage is as follow. TSTSearchEngine is using TernarySearchTree as the core backbone.

Example usage of Ternary Search Tree

// There is stock named microsoft and MICROChip inside stocks ArrayList.
TSTSearchEngine<Stock> engine = TSTSearchEngine<Stock>(stocks);
// I wish it would return microsoft and MICROCHIP. Currently, it just return microsoft.
List<Stock> results = engine.searchAll("micro"); 

回答1:


One of the key factor which make my current Ternary Search Tree difficult to support case insensitive search is that, my underlying data structure is one-to-one mapping. Please look at the following test code :

public void testPut() {
    System.out.println("put");
    Name name0 = new Name("abc");
    Name name1 = new Name("abc");
    TernarySearchTree<Name> instance = new TernarySearchTree<Name>();
    instance.put(name0.toString(), name0);
    instance.put(name1.toString(), name1);
    assertEquals(2, instance.matchPrefix("a").size()); // fail here. Result is 1
}

What my current short-term solution is that, I am using TSTSearchEngine to wrap up the whole TernarySearchTree. TSTSearchEngine is comprised of

(1) A TernarySearchTree, providing UPPER-CASE key to map.

(2) A String-To-ArrayList map.

Here is what happen when I perform :

TSTSearchEngine<Name> engine = TSTSearchEngine<Name>();
engine.put(name0); // name0 is new Name("Abc");
engine.put(name1); // name0 is new Name("aBc");

(1) name0.toString() will be converted to UPPER-CASE ("ABC"). It will be inserted to TernarySearchTree. "ABC" will be both key and value for TernarySearchTree.

(2) "ABC" will use as the key for map, to insert name0 into an array list.

(3) name1.toString() will be converted to UPPER-CASE ("ABC"). It will be inserted to TernarySearchTree. S1 will be both key and value for TernarySearchTree.

(4) "ABC" will use as the key for map, to insert name1 into an array list.

When I try to

engine.searchAll("a");

(1) TernarySearchTree will return "ABC".

(2) "ABC" will be used as the key to access map. Map will return an array list, which is containing name0 and name1.

This solution works. The sample code can be referred to Sample Code for New TSTSearchEngine

However, this may not be an effective solution, as it requires two pass of search. I find out there is an implementation in C++ C++ Implementation of Case Insensitive Ternary Search Tree. Hence, there is an opportunity that C++ code can be ported over to Java.




回答2:


I haven't used a TST before, but isn't this as simple as lower or uppercasing your keys, both during storage and during lookup? From your code snippet it looks like that should work.




回答3:


I've implemented a Ternary search tree, you can have a look at http://kunalekawde.blogspot.com/2010/05/radix-patricia-and-ternary.html As of case insensative, either you map small and caps to one hex/dec value or while getting prefix, check the value.



来源:https://stackoverflow.com/questions/618086/case-insensitive-ternary-search-tree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!