问题
How do I merge the bigrams below to a single string?
_bigrams=['the school', 'school boy', 'boy is', 'is reading']
_split=(' '.join(_bigrams)).split()
_newstr=[]
_filter=[_newstr.append(x) for x in _split if x not in _newstr]
_newstr=' '.join(_newstr)
print _newstr
Output:'the school boy is reading'
....its the desired output but the approach is too long and not quite efficient given the large size of my data. Secondly, the approach would not support duplicate words in the final string ie 'the school boy is reading, is he?'
. Only one of the 'is'
will be permitted in the final string in this case.
Any suggestions on how to make this work better? Thanks.
回答1:
# Multi-for generator expression allows us to create a flat iterable of words
all_words = (word for bigram in _bigrams for word in bigram.split())
def no_runs_of_words(words):
"""Takes an iterable of words and returns one with any runs condensed."""
prev_word = None
for word in words:
if word != prev_word:
yield word
prev_word = word
final_string = ' '.join(no_runs_of_words(all_words))
This takes advantage of generators to lazily evaluate and not keep the entire set of words in memory at the same time until generating the one final string.
回答2:
If you really wanted a oneliner, something like this could work:
' '.join(val.split()[0] for val in (_bigrams)) + ' ' + _bigrams[-1].split()[-1]
回答3:
Would this do it? It does simply take the first word up to the last entry
_bigrams=['the school', 'school boy', 'boy is', 'is reading']
clause = [a.split()[0] if a != _bigrams[-1] else a for a in _bigrams]
print ' '.join(clause)
Output
the school boy is reading
However, concerning performance probably Amber's solution is a good option
来源:https://stackoverflow.com/questions/22426159/merging-or-reversing-n-grams-to-a-single-string