I am parsing a file that has lines such as
type(\"book\") title(\"golden apples\") pages(10-35 70 200-234) comments(\"good read\")
And I want to split
Let me add a non-regex solution:
line = 'type("book") title("golden apples") pages(10-35 70 200-234) comments("good read")'
count = 0 # Bracket counter
last_break = 0 # Index of the last break
parts = []
for j,char in enumerate(line):
if char is '(': count += 1
elif char is ')': count -= 1
elif char is ' ' and count is 0:
parts.append(line[last_break:(j)])
last_break = j+1
parts.append(line[last_break:]) # Add last element
parts = tuple(p for p in parts if p) # Convert to tuple and remove empty
for p in parts:
print(p)
In general there are certain things you cannot do with regular expressions, and there can be serious performance penalties (especially for lookahead and lookbehind) which can cause them not to be the best solution for a certain problem.
Also; I thought I'd mention the pyparsing module which can be used to create custom text parsers.