I would like do something like that.
list_of_urls = [\'http://www.google.fr/\', \'http://www.google.fr/\',
\'http://www.google.cn/\', \'http
To do it exactly your way? You could use the for...else structure
for url in list_of_urls:
for url_dict in urls:
if url_dict['url'] == url:
url_dict['nbr'] += 1
break
else:
urls.append(dict(url=url, nbr=1))
But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:
urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}
for url in list_of_urls:
if url in urls:
urls[url]['nbr'] += 1
else:
urls[url] = dict(url=url, nbr=1)
A few things to note in that second example:
urls
removes the need for going through the whole urls
list when testing for one single url
. This approach will be faster.dict( )
instead of braces makes your code shorterlist_of_urls
, urls
and url
as variable names make the code quite hard to parse. It's better to find something clearer, such as urls_to_visit
, urls_already_visited
and current_url
. I know, it's longer. But it's clearer.And of course I'm assuming that dict(url='http://www.google.fr', nbr=1)
is a simplification of your own data structure, because otherwise, urls
could simply be:
urls = {'http://www.google.fr':1}
for url in list_of_urls:
if url in urls:
urls[url] += 1
else:
urls[url] = 1
Which can get very elegant with the defaultdict stance:
urls = collections.defaultdict(int)
for url in list_of_urls:
urls[url] += 1