Split a string with custom delimiter, respect and preserve quotes (single or double)

前端 未结 2 1515
予麋鹿
予麋鹿 2021-01-29 10:04

I have a string which is like this:

>>> s = \'1,\",2, \",,4,,,\\\',7, \\\',8,,10,\'
>>> s
\'1,\",2, \",,4,,,\\\',7, \\\',8,,10,\'
2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-29 10:35

    A modified version of this (which handles only white spaces) can do the trick (quotes are stripped):

    >>> import re
    >>> s = '1,",2, ",,4,,,\',7, \',8,,10,'
    
    >>> tokens = [t for t in re.split(r",?\"(.*?)\",?|,?'(.*?)',?|,", s) if t is not None ]
    >>> tokens
    ['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']
    

    And if you like to keep the quotes characters:

    >>> tokens = [t for t in re.split(r",?(\".*?\"),?|,?('.*?'),?|,", s) if t is not None ]
    >>> tokens
    ['1', '",2, "', '', '4', '', '', "',7, '", '8', '', '10', '']
    

    If you want to use a custom delimiter replace every occurrence of , in the regexp with your own delimiter.

    Explanation:

    | = match alternatives e.g. ( |X) = space or X
    .* = anything
    x? = x or nothing
    () = capture the content of a matched pattern
    
    We have 3 alternatives:
    
    1 "text"    -> ".*?" -> due to escaping rules becomes - > \".*?\"
    2 'text'    -> '.*?'
    3 delimiter ->  ,
    
    Since we want to capture the content of the text inside the quotes, we use ():
    
    1 \"(.*?)\"   (to keep the quotes use (\".*?\")
    2 '(.*?)'     (to keep the quotes use ('.*?')
    
    Finally we don't want that split function reports an empty match if a
    delimiter precedes and follows quotes, so we capture that possible
    delimiter too:
    
    1 ,?\"(.*?)\",?
    2 ,?'(.*?)',?
    
    Once we use the | operator to join the 3 possibilities we get this regexp:
    
    r",?\"(.*?)\",?|,?'(.*?)',?|,"
    

提交回复
热议问题