I need to take a string, and shorten it to 140 characters.
Currently I am doing:
if len(tweet) > 140:
tweet = re.sub(r\"\\s+\", \" \", tweet)
After speaking with some native Cantonese, Mandarin, and Japanese speakers it seems that the correct thing to do is hard, but my current algorithm still makes sense to them in the context of internet posts.
Meaning, they are used to the "split on space and add … at the end" treatment.
So I'm going to be lazy and stick with it, until I get complaints from people that don't understand it.
The only change to my original implementation would be to not force a space on the last word since it is unneeded in any language (and use the unicode character … … instead of ... three dots to save 2 characters)