Python filter/remove URLs from a list

问题

I have a text file of URLs, about 14000. Below is a couple of examples:

http://www.domainname.com/pagename?CONTENT_ITEM_ID=100&param2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101&param2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102&param2=123

I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?

Cheers

回答1:

Here's another alternative to Graeme's, using the newer list comprehension syntax:

list2= [line for line in file if 'CONTENT_ITEM_ID' in line]

Which you prefer is a matter of taste!

回答2:

I liked @bobince's answer (+1), but will up the ante.

Since you have a rather large starting set, you may wish to avoid loading the entire list into memory. Unless you need the whole list for something else, you could use a Python generator expression to perform the same task by building up the filtered list item by item as they're requested:

for filtered_url in (line for line in file if 'CONTENT_ITEM_ID' in line):
   do_something_with_filtered_url(filtered_url)

回答3:

list2 = filter( lambda x: x.find( 'CONTENT_ITEM_ID ') != -1,  list1 )

The filter calls the function (first parameter) on each element of list1 (second parameter). If the function returns true (non-zero), the element is copied to the output list.

The lambda basically creates a temporary unnamed function. This is just to avoid having to create a function and then pass it, like this:

function look_for_content_item_id( elem ):
    if elem.find( 'CONTENT_ITEM_ID') == -1:
        return 0
    return 1
list2 = filter( look_for_content_item_id, list1 )

回答4:

For completeness; You can also use ifilter. It is like filter, but doesn't build up a list.

from itertools import ifilter

for line in ifilter(lambda line: 'CONTENT_ITEM_ID' in line, urls):
    do_something(line)

来源：https://stackoverflow.com/questions/258390/python-filter-remove-urls-from-a-list

标签

python

url

list

filter