Python split() without removing the delimiter [duplicate]

六眼飞鱼酱① 提交于 2019-11-26 02:24:48

问题


This code almost does what I need it to..

for line in all_lines:
    s = line.split(\'>\')

Except it removes all the \'>\' delimiters.

So,

<html><head>

Turns into

[\'<html\',\'<head\']

Is there a way to use the split() method but keep the delimiter, instead of removing it?

With these results..

[\'<html>\',\'<head>\']

回答1:


d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e]



回答2:


If you are parsing HTML with splits, you are most likely doing it wrong, except if you are writing a one-shot script aimed at a fixed and secure content file. If it is supposed to work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'>?

Anyway, the following works for me:

>>> import re
>>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2]
['<body>', '<table>', '<tr>', '<td>']



回答3:


How about this:

import re
s = '<html><head>'
re.findall('[^>]+>', s)



回答4:


Just split it, then for each element in the array/list (apart from the last one) add a trailing ">" to it.



来源:https://stackoverflow.com/questions/7866128/python-split-without-removing-the-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!