How to find all subsets of a multiset that are in a given set?

六月ゝ 毕业季﹏ 提交于 2019-12-04 19:32:41

Here is a quick implementation of TernarySearchTree (TST) which can help in your problem. A number of years ago, I was inspired by an article in DrDobbs. You can read more about it at http://www.drdobbs.com/database/ternary-search-trees/184410528. It provides some background about TST and performance analysis.

In your problem description example, D would be your dictionary containing "dgo","aett" and "amt" keys. The values are identical to the keys.

M is your search string which basically says "Give me all the values in the dictionary with keys containing a subset or all of these alphabets". The order of the characters are not important. The character '.' is used as wildcard in the search.

For any given M, this algorithm and data structure does not require you to look at all elements of D. So in that respect it will be fast. I have also done some tests on the number of nodes visited and most of the time, the number of nodes visited is just a small fraction of the total nodes in the dictionary even for keys that are not found.

This algorithm also optionally allows you to enter the minimum and maximum length which limits the keys returned by the dictionary.

Sorry for the lengthy code but it is complete for you to be able to test.

import java.util.ArrayList;
import java.io.*;

public class TSTTree<T>
{
    private TSTNode<T> root;
    private int size = 0;
    private int totalNodes = 0;

    public int getSize() { return size; }

    public int getTotalNodes() { return totalNodes; }

    public TSTTree()
    {
    }

    public TSTNode<T> getRoot() { return root; }

    public void insert(String key, T value)
    {
        if(key==null || key.length()==0) return;

        char[] keyArray = key.toCharArray();

        if(root==null) root = new TSTNode<T>(keyArray[0]);
        TSTNode<T> currentNode = root;
        TSTNode<T> parentNode = null;

        int d = 0;
        int i = 0;

        while(currentNode!=null)
        {
            parentNode = currentNode;
            d = keyArray[i] - currentNode.getSplitChar();
            if(d==0)
            {
                if((++i) == keyArray.length) // Found the key
                {
                    if(currentNode.getValue()!=null)
                        System.out.println(currentNode.getValue() + " replaced with " + value);
                    else
                        size++;
                    currentNode.setValue(value);        // Replace value at Node
                    return;
                }
                else
                    currentNode = currentNode.getEqChild();
            }
            else if(d<0)
                currentNode = currentNode.getLoChild();
            else
                currentNode = currentNode.getHiChild();
        }

        currentNode = new TSTNode<T>(keyArray[i++]);
        totalNodes++;
        if(d==0)
            parentNode.setEqChild(currentNode);
        else if(d<0)
            parentNode.setLoChild(currentNode);
        else
            parentNode.setHiChild(currentNode);

        for(;i<keyArray.length;i++)
        {
            TSTNode<T> tempNode = new TSTNode<T>(keyArray[i]);
            totalNodes++;
            currentNode.setEqChild(tempNode);
            currentNode = tempNode;
        }

        currentNode.setValue(value);        // Insert value at Node
        size++;
    }

    public ArrayList<T> find(String charsToSearch) {
        return find(charsToSearch,1,charsToSearch.length());
    }

    // Return all values with keys between minLen and maxLen containing "charsToSearch".
    public ArrayList<T> find(String charsToSearch, int minLen, int maxLen) {
        ArrayList<T> list = new ArrayList<T>();
        char[] charArray = charsToSearch.toCharArray();
        int[] charFreq = new int[256];
        for(int i=0;i<charArray.length;i++) charFreq[charArray[i]]++;
        maxLen = charArray.length>maxLen ? maxLen : charArray.length;
        pmsearch(root,charFreq,minLen,maxLen,1, list);
        return list;
    }

    public void pmsearch(TSTNode<T> node, int[] charFreq, int minLen, int maxLen, int depth, ArrayList<T> list) {
        if(node==null) return;

        char c = node.getSplitChar();
        if(isSmaller(charFreq,c))
            pmsearch(node.getLoChild(),charFreq,minLen,maxLen,depth,list);
        if(charFreq[c]>0) {
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq[c]--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq[c]++;
            }
        }
        else if(charFreq['.']>0) { // Wildcard
            if(depth>=minLen && node.getValue()!=null) list.add(node.getValue());
            if(depth<maxLen) {
                charFreq['.']--;
                pmsearch(node.getEqChild(),charFreq,minLen,maxLen,depth+1,list);
                charFreq['.']++;
            }
        }            
        if(isGreater(charFreq,c))
            pmsearch(node.getHiChild(),charFreq,minLen,maxLen,depth,list);
    }

    private boolean isGreater(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c+1;i<charFreq.length;i++) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }

    private boolean isSmaller(int[] charFreq, char c) {
        if(charFreq['.']>0) return true;

        boolean retval = false;
        for(int i=c-1;i>-1;i--) {
            if(charFreq[i]>0) {
                retval = true;
                break;
            }
        }
        return retval;
    }
}

Below is a small test program. The test program just inserts the 4 key/value pairs in your example in the exact order. If you have a D with a lot of elements, then it would be best to sort it first and build the dictionary in a tournament fashion (ie. insert middle element, then recursively populate left half and right half). This will ensure the tree is balanced.

import org.junit.*;
import org.junit.runner.*;
import java.io.*;
import java.util.*;
import org.junit.runner.notification.Failure;

public class MyTest
{
    static TSTTree<String> dictionary = new TSTTree<String>();

    @BeforeClass
    public static void initialize() {
        dictionary.insert("dgo","dgo");
        dictionary.insert("aett","aett");
        dictionary.insert("amt","amt");
    }

    @Test
    public void testMethod() {
        System.out.println("testMethod");
        ArrayList<String> result = dictionary.find("aamt");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    @Test
    // Test with a wildcard which finds "dgo" value
    public void testMethod1() {
        System.out.println("testMethod1");
        ArrayList<String> result = dictionary.find("aamtdo.");
        System.out.println("Results: ");
        for(Iterator it=result.iterator();it.hasNext();) {
            System.out.println(it.next());
        }
    }

    public static void main(String[] args) {
        Result result = JUnitCore.runClasses(MyTest.class);
        for (Failure failure : result.getFailures()) {
        System.out.println(failure.toString());
        }
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!