I need to find a dynamic programming algorithm to solve this problem. I tried but couldn\'t figure it out. Here is the problem:
You are given a string of n character
Let the length of your compacted document be N.
Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.
b(N) is true (since the empty string can be split into 0 words). Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.
Eventually, you compute b(0) which tells you if the entire document can be split into words.
In pseudo-code:
def try_to_split(doc):
N = len(doc)
b = [False] * (N + 1)
b[N] = True
for i in range(N - 1, -1, -1):
for word starting at position i:
if b[i + len(word)]:
b[i] = True
break
return b
There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.
To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:
def generate_words(doc, b, idx=0):
length = 1
while true:
assert b(idx)
if idx == len(doc): return
word = doc[idx: idx + length]
if word in dictionary and b(idx + length):
output(word)
idx += length
length = 1
Here b is the boolean array generated from the first part of the algorithm.