Extract image links from the webpage using Python

后端 未结 3 704
走了就别回头了
走了就别回头了 2021-01-07 00:39

So I wanted to get all of the pictures on this page(of the nba teams). http://www.cbssports.com/nba/draft/mock-draft

However, my code gives a bit more than that. It

3条回答
  •  感动是毒
    2021-01-07 01:18

    I know this can be "traumatic", but for those automatically generated pages, where you just want to grab the damn images away and never come back, a quick-n-dirty regular expression that takes the desired pattern tends to be my choice (no Beautiful Soup dependency is a great advantage):

    import urllib, re
    
    source = urllib.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()
    
    ## every image name is an abbreviation composed by capital letters, so...
    for link in re.findall('http://sports.cbsimg.net/images/nba/logos/30x30/[A-Z]*.png', source):
        print link
    
    
        ## the code above just prints the link;
        ## if you want to actually download, set the flag below to True
    
        actually_download = False
        if actually_download:
            filename = link.split('/')[-1]
            urllib.urlretrieve(link, filename)
    

    Hope this helps!

提交回复
热议问题