Beautiful Soup and extracting a div and its contents by ID

后端 未结 13 1511
死守一世寂寞
死守一世寂寞 2020-11-30 19:54
soup.find(\"tagName\", { \"id\" : \"articlebody\" })

Why does this NOT return the

...
tags
13条回答
  •  离开以前
    2020-11-30 20:17

    I think there is a problem when the 'div' tags are too much nested. I am trying to parse some contacts from a facebook html file, and the Beautifulsoup is not able to find tags "div" with class "fcontent".

    This happens with other classes as well. When I search for divs in general, it turns only those that are not so much nested.

    The html source code can be any page from facebook of the friends list of a friend of you (not the one of your friends). If someone can test it and give some advice I would really appreciate it.

    This is my code, where I just try to print the number of tags "div" with class "fcontent":

    from BeautifulSoup import BeautifulSoup 
    f = open('/Users/myUserName/Desktop/contacts.html')
    soup = BeautifulSoup(f) 
    list = soup.findAll('div', attrs={'class':'fcontent'})
    print len(list)
    

提交回复
热议问题