问题
I want to search for a key word in a document and then check to see whether that keyword is within 5 lines of another key word. If it is, I want to print the line and the following 50 lines.
In this example, I am searching a document for the word "carrying" and I want to make sure that the word "carrying" is within 5 lines of the words "Financial Assets:" My code is able to find and print the lines when I just include the search for "carrying", but when I include the search for "Financial Assets:" it does not find anything (even though I know it's there in the document).
import urllib2
data = []
html = urllib2.urlopen("ftp://ftp.sec.gov/edgar/data/1001627/0000950116-97-001247.txt")
searchlines = html.readlines()
for m, line in enumerate(searchlines):
line = line.lower()
if "carrying" in line and "Financial Assets:" in searchlines[m-5:m+5]:
for l in searchlines[m-5:m+50]:
data.append(l)
print ''.join(data)
Any help would be much appreciated.
回答1:
Instead of
"Financial Assets:" in searchlines[m-5:m+5]
You need to have:
any("Financial Assets:" in line2 for line2 in searchlines[m-5:m+5])
Your original code looks for a line which contains exactly the content "Financial Assets:", instead of looking for it as a substring in each line.
回答2:
The expression
"carrying" in line
searches the string in any position inside the line. However the statement
"Finantial Assets:" in searchlines[m-5:m+5]
is searching for an exact match (i.e. a line that's exactly `"Finantial Assets:") in that sublist. You need to change this second part to something like
"Finantial Assets:" in " ".join(searchlines[m-5:m+5])
来源:https://stackoverflow.com/questions/5825055/how-can-i-search-within-a-document-for-a-keyword-and-then-subsequent-key-words-w