Split a string to a string of valid words using Dynamic Programming

后端 未结 6 1214
不知归路
不知归路 2020-12-23 15:02

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn\'t figure it out. Here is the problem:

You are given a string of n character

6条回答
  •  被撕碎了的回忆
    2020-12-23 15:44

    Let the length of your compacted document be N.

    Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.

    b(N) is true (since the empty string can be split into 0 words). Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.

    Eventually, you compute b(0) which tells you if the entire document can be split into words.

    In pseudo-code:

    def try_to_split(doc):
      N = len(doc)
      b = [False] * (N + 1)
      b[N] = True
      for i in range(N - 1, -1, -1):
        for word starting at position i:
          if b[i + len(word)]:
            b[i] = True
            break
      return b
    

    There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.

    To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:

    def generate_words(doc, b, idx=0):
      length = 1
      while true:
        assert b(idx)
        if idx == len(doc): return
        word = doc[idx: idx + length]
        if word in dictionary and b(idx + length):
           output(word)
           idx += length
           length = 1
    

    Here b is the boolean array generated from the first part of the algorithm.

提交回复
热议问题