问题
This code almost does what I need it to..
for line in all_lines:
s = line.split(\'>\')
Except it removes all the \'>\' delimiters.
So,
<html><head>
Turns into
[\'<html\',\'<head\']
Is there a way to use the split() method but keep the delimiter, instead of removing it?
With these results..
[\'<html>\',\'<head>\']
回答1:
d = ">"
for line in all_lines:
s = [e+d for e in line.split(d) if e]
回答2:
If you are parsing HTML with splits, you are most likely doing it wrong, except if you are writing a one-shot script aimed at a fixed and secure content file. If it is supposed to work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'>?
Anyway, the following works for me:
>>> import re
>>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2]
['<body>', '<table>', '<tr>', '<td>']
回答3:
How about this:
import re
s = '<html><head>'
re.findall('[^>]+>', s)
回答4:
Just split it, then for each element in the array/list (apart from the last one) add a trailing ">" to it.
来源:https://stackoverflow.com/questions/7866128/python-split-without-removing-the-delimiter