Determining a string has all unique characters without using additional data structures and without the lowercase characters assumption

你离开我真会死。 提交于 2019-11-30 14:06:01

for the asccii character set you can represent the 256bits in 4 longs: you basically hand code an array.

public static boolean isUniqueChars(String str) {
    long checker1 = 0;
    long checker2 = 0;
    long checker3 = 0;
    long checker4 = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i);
        int toCheck = val / 64;
        val %= 64;
        switch (toCheck) {
            case 0:
                if ((checker1 & (1L << val)) > 0) {
                    return false;
                }
                checker1 |= (1L << val);
                break;
            case 1:
                if ((checker2 & (1L << val)) > 0) {
                    return false;
                }
                checker2 |= (1L << val);
                break;
            case 2:
                if ((checker3 & (1L << val)) > 0) {
                    return false;
                }
                checker3 |= (1L << val);
                break;
            case 3:
                if ((checker4 & (1L << val)) > 0) {
                    return false;
                }
                checker4 |= (1L << val);
                break;
        }            
    }
    return true;
}

You can use the following code to generate the body of a similar method for unicode characters:

static void generate() {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("long checker%d = 0;%n", i));
    }
    sb.append("for (int i = 0; i < str.length(); ++i) {\n"
            + "int val = str.charAt(i);\n"
            + "int toCheck = val / 64;\n"
            + "val %= 64;\n"
            + "switch (toCheck) {\n");
    for (int i = 0; i < 1024; i++) {
        sb.append(String.format("case %d:\n"
                + "if ((checker%d & (1L << val)) > 0) {\n"
                + "return false;\n"
                + "}\n"
                + "checker%d |= (1L << val);\n"
                + "break;\n", i, i, i));
    }
    sb.append("}\n"
            + "}\n"
            + "return true;");
    System.out.println(sb);
}

You only need one line... well less than one line actually:

if (str.matches("((.)(?!.*\\1))*"))

this uses a negative look ahead to assert that each character is not repeated later in the string.

This approach a time complexity of O(n^2), because for all n characters in the input, all characters that follow (there are n of those) are compared for equality.

I think we need a general and practical definition of "additional data structures". Intuitively, we don't want to call every scalar integer or pointer a "data structure", because that makes nonsense of any prohibition of "additional data structures".

I propose we borrow a concept from big-O notation: an "additional data structure" is one that grows with the size of the data set.

In the present case, the code quoted by the OP appears to have a space requirement of O(1) because the bit vector happens to fit into an integer type. But as the OP implies, the general form of the problem is really O(N).

An example of a solution to the general case is to use two pointers and a nested loop to simply compare every character to every other. The space requirement is O(1) but the time requirement is O(N^2).

How about the following algorithm?

Steps:

Convert string to lowercase.

Loop through each character in the string

Set variable data = 0

Calculate offset = ascii value of first character in string - 97

Set the flag for that position with mask = 1 << offset

If bitwise AND returns true, then a character is repeating (mask & data), so break here.

else if we have not seen the character repeat yet, set the bit for that character by doing a bitwise OR by doing data = data | mask

Continue till end of characters.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!