How to extract a couple marked strings from a line (python)

ε祈祈猫儿з 提交于 2019-12-10 12:19:37

问题


My Friends,

I spent quite some time on this one... but cannot yet figure out a better way to do it. I am coding in python, by the way.

So, here is a line of text in a file I am working with, for example:

">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."

How can I extract the two strings "ZP_01631227.1" and "Nodularia spumigena CCY9414" from the line?

The pairs of "| |" and brackets are like markers so we know we want to get the strings in between the two...

I guess I can probably loop over all the characters in the line and do it the hard way. It just takes so much time... Wondering if there is a python library or other smart ways to do it nicely?

Thanks to all!


回答1:


>>> for line in open("file"):
...     if "|" in line:
...         whatiwant_1=line.split("|")[1]
...         if "[" in line:
...             whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414



回答2:


One concise alternative is a regular expression (for some reason they have a bad rep in the Python community, but they do provide conciseness and power for simple text handling):

import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
  thefirst, thesecond = mo.groups()


来源:https://stackoverflow.com/questions/2573698/how-to-extract-a-couple-marked-strings-from-a-line-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!