Python filter/remove URLs from a list

元气小坏坏 提交于 2021-02-07 03:27:01

问题


I have a text file of URLs, about 14000. Below is a couple of examples:

http://www.domainname.com/pagename?CONTENT_ITEM_ID=100&param2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101&param2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102&param2=123

I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?

Cheers


回答1:


Here's another alternative to Graeme's, using the newer list comprehension syntax:

list2= [line for line in file if 'CONTENT_ITEM_ID' in line]

Which you prefer is a matter of taste!




回答2:


I liked @bobince's answer (+1), but will up the ante.

Since you have a rather large starting set, you may wish to avoid loading the entire list into memory. Unless you need the whole list for something else, you could use a Python generator expression to perform the same task by building up the filtered list item by item as they're requested:

for filtered_url in (line for line in file if 'CONTENT_ITEM_ID' in line):
   do_something_with_filtered_url(filtered_url)



回答3:


list2 = filter( lambda x: x.find( 'CONTENT_ITEM_ID ') != -1,  list1 )

The filter calls the function (first parameter) on each element of list1 (second parameter). If the function returns true (non-zero), the element is copied to the output list.

The lambda basically creates a temporary unnamed function. This is just to avoid having to create a function and then pass it, like this:

function look_for_content_item_id( elem ):
    if elem.find( 'CONTENT_ITEM_ID') == -1:
        return 0
    return 1
list2 = filter( look_for_content_item_id, list1 )



回答4:


For completeness; You can also use ifilter. It is like filter, but doesn't build up a list.

from itertools import ifilter

for line in ifilter(lambda line: 'CONTENT_ITEM_ID' in line, urls):
    do_something(line)


来源:https://stackoverflow.com/questions/258390/python-filter-remove-urls-from-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!