Split string into key-value pairs

后端 未结 7 1594
青春惊慌失措
青春惊慌失措 2020-12-31 03:40

I have a string like this:

pet:cat::car:honda::location:Japan::food:sushi

Now : indicates key-value pairs while ::

7条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-31 04:32

    Your solution is indeed somewhat inefficient.

    The person who gave you the string to parse is also somewhat of a clown. There are industry standard serialization formats, like JSON or XML, for which fast, efficient parses exist. Inventing the square wheel is never a good idea.

    First question: Do you care? Is it slow enough that it hinders performance of your application? It's likely not to, but there is only one way to find out. Benchmark your code.

    That said, more efficient solutions exist. Below is an example

    public static void main (String[] args) throws java.lang.Exception
    {
        String test = "pet:cat::car:honda::location:Japan::food:sushi";
        boolean stateiskey = true;
    
        Map map = new HashMap<>();
        int keystart = 0;
        int keyend = 0;
        int valuestart = 0;
        int valueend = 0;
    
        for(int i = 0; i < test.length(); i++){
            char nextchar = test.charAt(i);
            if (stateiskey) {
                if (nextchar == ':') {
                  keyend = i;           
                  stateiskey = false;
                  valuestart = i + 1;
                }
            } else {
                if (i == test.length() - 1 || (nextchar == ':' && test.charAt(i + 1) == ':')) {
                    valueend = i;
                    if (i + 1 == test.length()) valueend += 1; //compensate one for the end of the string
                    String key = test.substring(keystart, keyend);
                    String value = test.substring(valuestart, valueend);
                    keystart = i + 2;
                    map.put(key, value);
                    i++;
                    stateiskey = true;
                }
            }
        }
    
        System.out.println(map);
    }
    

    This solution is a finite state machine with only two states. It looks at every character only twice, once when it tests it for a boundary, and once when it copies it to the new string in your map. This is the minimum amount.

    It doesn't create objects that are not needed, like stringbuilders, strings or arrays, this keeps collection pressure low.

    It maintains good locality. The next character probably always is in cache, so the lookup is cheap.

    It comes at a grave cost that is probably not worth it though:

    • It's far more complicated and less obvious
    • There are all sorts of moving parts
    • It's harder to debug when your string is in an unexpected format
    • Your coworkers will hate you
    • You will hate you when you have to debug something

    Worth it? Maybe. How fast do you need that string parsed exactly?

    A quick and dirty benchmark at https://ideone.com/8T7twy tells me that for this string, this method is approximately 4 times faster. For longer strings the difference is likely somewhat greater.

    But your version is still only 415 milliseconds for 100.000 repetitions, where this one is 99 milliseconds.

提交回复
热议问题