Merging or reversing n-grams to a single string

眉间皱痕 提交于 2021-01-27 23:44:54

问题


How do I merge the bigrams below to a single string?

_bigrams=['the school', 'school boy', 'boy is', 'is reading']
_split=(' '.join(_bigrams)).split()
_newstr=[]
_filter=[_newstr.append(x) for x in _split if x not in _newstr]
_newstr=' '.join(_newstr)
print _newstr

Output:'the school boy is reading'....its the desired output but the approach is too long and not quite efficient given the large size of my data. Secondly, the approach would not support duplicate words in the final string ie 'the school boy is reading, is he?'. Only one of the 'is' will be permitted in the final string in this case.

Any suggestions on how to make this work better? Thanks.


回答1:


# Multi-for generator expression allows us to create a flat iterable of words
all_words = (word for bigram in _bigrams for word in bigram.split())

def no_runs_of_words(words):
    """Takes an iterable of words and returns one with any runs condensed."""
    prev_word = None
    for word in words:
        if word != prev_word:
            yield word
        prev_word = word

final_string = ' '.join(no_runs_of_words(all_words))

This takes advantage of generators to lazily evaluate and not keep the entire set of words in memory at the same time until generating the one final string.




回答2:


If you really wanted a oneliner, something like this could work:

' '.join(val.split()[0] for val in (_bigrams)) + ' ' +  _bigrams[-1].split()[-1]



回答3:


Would this do it? It does simply take the first word up to the last entry

_bigrams=['the school', 'school boy', 'boy is', 'is reading']

clause = [a.split()[0] if a != _bigrams[-1] else a for a in _bigrams]

print ' '.join(clause)

Output

the school boy is reading

However, concerning performance probably Amber's solution is a good option



来源:https://stackoverflow.com/questions/22426159/merging-or-reversing-n-grams-to-a-single-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!