问题
I have a text file of URLs, about 14000. Below is a couple of examples:
http://www.domainname.com/pagename?CONTENT_ITEM_ID=100¶m2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101¶m2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102¶m2=123
I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?
Cheers
回答1:
Here's another alternative to Graeme's, using the newer list comprehension syntax:
list2= [line for line in file if 'CONTENT_ITEM_ID' in line]
Which you prefer is a matter of taste!
回答2:
I liked @bobince's answer (+1), but will up the ante.
Since you have a rather large starting set, you may wish to avoid loading the entire list into memory. Unless you need the whole list for something else, you could use a Python generator expression to perform the same task by building up the filtered list item by item as they're requested:
for filtered_url in (line for line in file if 'CONTENT_ITEM_ID' in line):
do_something_with_filtered_url(filtered_url)
回答3:
list2 = filter( lambda x: x.find( 'CONTENT_ITEM_ID ') != -1, list1 )
The filter calls the function (first parameter) on each element of list1 (second parameter). If the function returns true (non-zero), the element is copied to the output list.
The lambda basically creates a temporary unnamed function. This is just to avoid having to create a function and then pass it, like this:
function look_for_content_item_id( elem ):
if elem.find( 'CONTENT_ITEM_ID') == -1:
return 0
return 1
list2 = filter( look_for_content_item_id, list1 )
回答4:
For completeness; You can also use ifilter
. It is like filter, but doesn't build up a list.
from itertools import ifilter
for line in ifilter(lambda line: 'CONTENT_ITEM_ID' in line, urls):
do_something(line)
来源:https://stackoverflow.com/questions/258390/python-filter-remove-urls-from-a-list