pass argument to findAll in bs4 in python

浪尽此生 提交于 2020-01-07 04:07:06

问题


I need help with using bs4 in a function. If I want to pass the path to findAll (or find) through function, it does not work. Please see the sample below.

from bs4 import BeautifulSoup
data = '<h1 class="headline">Willkommen!</h1>' 

def check_text(path, value):

    soup = BeautifulSoup(''.join(data), "lxml")

    x1 = "h1", {"class":"headline"}
    x2 = path
    x3 = tuple(path)
    print type(x1), 'soup.findAll(x1)===', soup.findAll(x1)
    print type(x2), 'soup.findAll(x2)===', soup.findAll(x2)
    print type(x3), 'soup.findAll(x3)===', soup.findAll(x3)

    for i in soup.findAll(x1):
        print 'x1, text=', i.getText()

    for i in soup.findAll(x2):    
        print 'x2, text=', i.getText()

    for i in soup.findAll(x3):    
        print 'x3, text=', i.getText()    


check_text('"h1", {"class": "headline"}', 'Willkommen!')

the output is

<type 'tuple'> soup.findAll(x1)=== [<h1 class="headline">Willkommen!     </h1>]

<type 'str'> soup.findAll(x2)=== []

<type 'tuple'> soup.findAll(x3)=== []

x1, text= Willkommen!

Does anyone have an solution? thanks


回答1:


from bs4 import BeautifulSoup
data = '<h1 class="headline">Willkommen!</h1>' 

def check_text(path, value):

    soup = BeautifulSoup(''.join(data), "lxml")

    x1 = "h1", {"class":"headline"}
    print (type(x1), 'soup.findAll(x1)===', soup.findAll(x1))
    print (type(path), 'soup.findAll(path)===', soup.findAll(**path))

    for i in soup.findAll(x1):
        print ('x1, text=', i.getText())

    for i in soup.findAll(**path):    
        print ('path, text=', i.getText())


check_text({'name' : "h1", 'attrs': {"class": "headline"} }, 'Willkommen!')

instead of passing as a string, pass a dictionary, whose elements can be passed as keyword arguments to the called function.




回答2:


The findAll method takes a tag name as first parameter, and not a path. It returns all the tags whose name matches the one passed, that are descendants of the tag on which it is called. This is the only way it is intended to be used, ie it is not meant to receive a path. Check the documentation for more details.

Now, soup.findAll(path) will look for the tags whose name is path. Since path = '"h1", {"class": "headline"}', soup.findAll(path) will look for the <'"h1", {"class": "headline"}'> tags in the HTML string, which most likely doesn't exist.

So basically, there's no such thing as a "path". Still, the syntax you're using makes me think that you want the tags whose class attribute is equal to "headline". The way to specify attributes to the findAll method is passing them as a dictionary to the attrs argument. What you probably mean to do is:

soup.findAll('h1', attrs={'class': "headline"}, text="wilkommen")


来源:https://stackoverflow.com/questions/45028533/pass-argument-to-findall-in-bs4-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!