Parsing web page in python using Beautiful Soup

后端 未结 2 786
名媛妹妹
名媛妹妹 2021-02-15 10:26

I have some troubles with getting the data from the website. The website source is here:

view-source:http://release24.pl/wpis/23714/%22La+mer+a+boire%22+%282011         


        
2条回答
  •  無奈伤痛
    2021-02-15 11:03

    This will get you the List You want you'll have to write some code to get rid of the trailing '....'s and to convert the character strings.

        import urllib2
        from bs4 import BeautifulSoup
    
         try :
     web_page = urllib2.urlopen("http://release24.pl/wpis/23714/%22La+mer+a+boire%22+%282011%29+FRENCH.DVDRip.XviD-AYMO").read()
    soup = BeautifulSoup(web_page)
    LIST = []
    for p in soup.findAll('p'):
        s = p.find('span',{ "class" : 'i' })
        t = p.find('span',{ "class" : 'vi' })
        if s and t:
            p_list = [s.string,t.string]
            LIST.append(p_list)
    

    except urllib2.HTTPError : print("HTTPERROR!") except urllib2.URLError : print("URLERROR!")

提交回复
热议问题