Python: split string by a multi-character delimiter unless inside quotes

二次信任 提交于 2021-02-10 03:24:51

问题


In my case the delimiter string is ' ' (3 consecutive spaces, but the answer should work for any multi-character delimiter), and an edge case text to search in could be this:

'Coord="GLOB"AL   Axis=X   Type="Y   ZR"   Color="Gray Dark"   Alt="Q   Z"qz   Loc=End'

The solution should return the following strings:

Coord="GLOB"AL
Axis=X
Type="Y   ZR"
Color="Gray Dark"
Alt="Q   Z"qz
Loc=End

I've looked for regex solutions, evaluating also the inverse problem (match multi-character delimiter unless inside quotes), since the re.split command of Python 3.4.3 allows to easily split a text by a regex pattern, but I'm not sure there is a regex solution, therefore I'm open also to (efficient) non regex solutions.

I've seen some solution to the inverse problem using lookahead/lookbehind containing regex pattern, but they did not work because Python lookahead/lookbehind (unlike other languages engine) requires fixed-width pattern.

This question is not a duplicate of Regex matching spaces, but not in "strings" or similar other questions, because:

  1. matching a single space outside quotes is different from matching a multi-character delimiter (in my example the delimiter is 3 spaces, but the question is about any multi-character delimiter);
  2. Python regex engine is slightly different from C++ or other languages regex engines;
  3. matching a delimiter is side B of my question, the direct question is about splitting a string.

回答1:


x='Coord="GLOB"AL   Axis=X   Type="Y   ZR"   Color="Gray Dark"   Alt="Q   Z"qz   Loc=End'
print re.split(r'\s+(?=(?:[^"]*"[^"]*")*[^"]*$)',x)

You need to use lookahead to see if the space it not in between ""

Output ['Coord="GLOB"AL', 'Axis=X', 'Type="Y ZR"', 'Color="Gray Dark"', 'Alt="Q Z"qz', 'Loc=End']

For a generalized version if you want to split on delimiters not present inside "" use

re.split(r'delimiter(?=(?:[^"]*"[^"]*")*[^"]*$)',x)


来源:https://stackoverflow.com/questions/32117330/python-split-string-by-a-multi-character-delimiter-unless-inside-quotes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!