How to split a string into words. Ex: “stringintowords” -> “String Into Words”?

前端 未结 13 1197
粉色の甜心
粉色の甜心 2020-11-29 20:18

What is the right way to split a string into words ? (string doesn\'t contain any spaces or punctuation marks)

For example: \"stringintowords\" -> \"String Into Word

相关标签:
13条回答
  • 2020-11-29 20:23

    This is basically a variation of a knapsack problem, so what you need is a comprehensive list of words and any of the solutions covered in Wiki.

    With fairly-sized dictionary this is going to be insanely resource-intensive and lengthy operation, and you cannot even be sure that this problem will be solved.

    0 讨论(0)
  • 2020-11-29 20:23

    The only way that you could split that string into words is to use a dictionary. Although this would probably be quite resource intensive.

    0 讨论(0)
  • 2020-11-29 20:26

    Create a list of possible words, sort it from long words to short words.

    Check if each entry in the list against the first part of the string. If it equals, remove this and append it at your sentence with a space. Repeat this.

    0 讨论(0)
  • 2020-11-29 20:28

    As mentioned by many people here, this is a standard, easy dynamic programming problem: the best solution is given by Falk Hüffner. Additional info though:

    (a) you should consider implementing isWord with a trie, which will save you a lot of time if you use properly (that is by incrementally testing for words).

    (b) typing "segmentation dynamic programming" yields a score of more detail answers, from university level lectures with pseudo-code algorithm, such as this lecture at Duke's (which even goes so far as to provide a simple probabilistic approach to deal with what to do when you have words that won't be contained in any dictionary).

    0 讨论(0)
  • 2020-11-29 20:28

    If you want to ensure that you get this right, you'll have to use a dictionary based approach and it'll be horrendously inefficient. You'll also have to expect to receive multiple results from your algorithm.

    For example: windowsteamblog (of http://windowsteamblog.com/ fame)

    • windows team blog
    • window steam blog
    0 讨论(0)
  • 2020-11-29 20:29

    A simple Java solution which has O(n^2) running time.

    public class Solution {
        // should contain the list of all words, or you can use any other data structure (e.g. a Trie)
        private HashSet<String> dictionary;
    
        public String parse(String s) {
            return parse(s, new HashMap<String, String>());
        }
    
        public String parse(String s, HashMap<String, String> map) {
            if (map.containsKey(s)) {
                return map.get(s);
            }
            if (dictionary.contains(s)) {
                return s;
            }
            for (int left = 1; left < s.length(); left++) {
                String leftSub = s.substring(0, left);
                if (!dictionary.contains(leftSub)) {
                    continue;
                }
                String rightSub = s.substring(left);
                String rightParsed = parse(rightSub, map);
                if (rightParsed != null) {
                    String parsed = leftSub + " " + rightParsed;
                    map.put(s, parsed);
                    return parsed;
                }
            }
            map.put(s, null);
            return null;
        }
    }
    
    0 讨论(0)
提交回复
热议问题