Python: Split unicode string on word boundaries

前端 未结 9 1088
清酒与你
清酒与你 2020-12-31 13:16

I need to take a string, and shorten it to 140 characters.

Currently I am doing:

if len(tweet) > 140:
    tweet = re.sub(r\"\\s+\", \" \", tweet)          


        
9条回答
  •  爱一瞬间的悲伤
    2020-12-31 13:40

    After speaking with some native Cantonese, Mandarin, and Japanese speakers it seems that the correct thing to do is hard, but my current algorithm still makes sense to them in the context of internet posts.

    Meaning, they are used to the "split on space and add … at the end" treatment.

    So I'm going to be lazy and stick with it, until I get complaints from people that don't understand it.

    The only change to my original implementation would be to not force a space on the last word since it is unneeded in any language (and use the unicode character … instead of ... three dots to save 2 characters)

提交回复
热议问题