Python - regex - Splitting string before word

岁酱吖の 提交于 2019-12-02 06:27:59

问题


I am trying to split a string in python before a specific word. For example, I would like to split the following string before "path:".

  • split string before "path:"
  • input: "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
  • output: ['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

I have tried

rx = re.compile("(:?[^:]+)")
rx.findall(line)

This does not split the string anywhere. The trouble is that the values after "path:" will never be known to specify the whole word. Does anyone know how to do this?


回答1:


using a regular expression to split your string seems a bit overkill: the string split() method may be just what you need.

anyway, if you really need to match a regular expression in order to split your string, you should use the re.split() method, which splits a string upon a regular expression match.

also, use a correct regular expression for splitting:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

the (?=...) group is a lookahead assertion: the expression matches a space (note the space at the start of the expression) which is followed by the string 'path:', without consuming what follows the space.




回答2:


You could do ["path:"+s for s in line.split("path:")[1:]] instead of using a regex. (note that we skip first match, that has no "path:" prefix.




回答3:


in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
in_list = in_str.split('path:')
print ",path:".join(in_list)[1:]



回答4:


This can be done without regular expressons. Given a string:

s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."

We can temporarily replace the desired word with a placeholder. The placeholder is a single character, which we use to split by:

word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']

Now that the string is split, we can rejoin the original word to each sub-string using a list comprehension:

["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']


来源:https://stackoverflow.com/questions/6709067/python-regex-splitting-string-before-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!