Efficiently split a string using multiple separators and retaining each separator?

前端 未结 9 1458
野趣味
野趣味 2021-02-02 10:44

I need to split strings of data using each character from string.punctuation and string.whitespace as a separator.

Furthermore, I need for the

9条回答
  •  终归单人心
    2021-02-02 11:31

    Solution in linear (O(n)) time:

    Let's say you have a string:

    original = "a, b...c    d"
    

    First convert all separators to space:

    splitters = string.punctuation + string.whitespace
    trans = string.maketrans(splitters, ' ' * len(splitters))
    s = original.translate(trans)
    

    Now s == 'a b c d'. Now you can use itertools.groupby to alternate between spaces and non-spaces:

    result = []
    position = 0
    for _, letters in itertools.groupby(s, lambda c: c == ' '):
        letter_count = len(list(letters))
        result.append(original[position:position + letter_count])
        position += letter_count
    

    Now result == ['a', ', ', 'b', '...', 'c', ' ', 'd'], which is what you need.

提交回复
热议问题