I have a string which is like this:
>>> s = \'1,\",2, \",,4,,,\\\',7, \\\',8,,10,\'
>>> s
\'1,\",2, \",,4,,,\\\',7, \\\',8,,10,\'
A modified version of this (which handles only white spaces) can do the trick (quotes are stripped):
>>> import re
>>> s = '1,",2, ",,4,,,\',7, \',8,,10,'
>>> tokens = [t for t in re.split(r",?\"(.*?)\",?|,?'(.*?)',?|,", s) if t is not None ]
>>> tokens
['1', ',2, ', '', '4', '', '', ',7, ', '8', '', '10', '']
And if you like to keep the quotes characters:
>>> tokens = [t for t in re.split(r",?(\".*?\"),?|,?('.*?'),?|,", s) if t is not None ]
>>> tokens
['1', '",2, "', '', '4', '', '', "',7, '", '8', '', '10', '']
If you want to use a custom delimiter replace every occurrence of , in the regexp with your own delimiter.
Explanation:
| = match alternatives e.g. ( |X) = space or X
.* = anything
x? = x or nothing
() = capture the content of a matched pattern
We have 3 alternatives:
1 "text" -> ".*?" -> due to escaping rules becomes - > \".*?\"
2 'text' -> '.*?'
3 delimiter -> ,
Since we want to capture the content of the text inside the quotes, we use ():
1 \"(.*?)\" (to keep the quotes use (\".*?\")
2 '(.*?)' (to keep the quotes use ('.*?')
Finally we don't want that split function reports an empty match if a
delimiter precedes and follows quotes, so we capture that possible
delimiter too:
1 ,?\"(.*?)\",?
2 ,?'(.*?)',?
Once we use the | operator to join the 3 possibilities we get this regexp:
r",?\"(.*?)\",?|,?'(.*?)',?|,"