问题
My Friends,
I spent quite some time on this one... but cannot yet figure out a better way to do it. I am coding in python, by the way.
So, here is a line of text in a file I am working with, for example:
">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
How can I extract the two strings "ZP_01631227.1" and "Nodularia spumigena CCY9414" from the line?
The pairs of "| |" and brackets are like markers so we know we want to get the strings in between the two...
I guess I can probably loop over all the characters in the line and do it the hard way. It just takes so much time... Wondering if there is a python library or other smart ways to do it nicely?
Thanks to all!
回答1:
>>> for line in open("file"):
... if "|" in line:
... whatiwant_1=line.split("|")[1]
... if "[" in line:
... whatiwant_2=line.split("[")[1].split("]")[0]
...
>>> print whatiwant_1 , whatiwant_2
ZP_01631227.1 Nodularia spumigena CCY9414
回答2:
One concise alternative is a regular expression (for some reason they have a bad rep in the Python community, but they do provide conciseness and power for simple text handling):
import re
s = ">ref|ZP_01631227.1| 3-dehydroquinate synthase [Nodularia spumigena CCY9414]..."
mo = re.search(r'\|(.*?)\|/*\[(.*?)\]', s)
if mo:
thefirst, thesecond = mo.groups()
来源:https://stackoverflow.com/questions/2573698/how-to-extract-a-couple-marked-strings-from-a-line-python