Test if an attribute is present in a tag in BeautifulSoup

后端 未结 6 804
天涯浪人
天涯浪人 2020-12-07 22:01

I would like to get all the

相关标签:
6条回答
  • 2020-12-07 22:35

    You don't need any lambdas to filter by attribute, you can simply use some_attribute=True in find or find_all.

    script_tags = soup.find_all('script', some_attribute=True)
    
    # or
    
    script_tags = soup.find_all('script', {"some-data-attribute": True})
    

    Here are more examples with other approaches as well:

    soup = bs4.BeautifulSoup(html)
    
    # Find all with a specific attribute
    
    tags = soup.find_all(src=True)
    tags = soup.select("[src]")
    
    # Find all meta with either name or http-equiv attribute.
    
    soup.select("meta[name],meta[http-equiv]")
    
    # find any tags with any name or source attribute.
    
    soup.select("[name], [src]")
    
    # find first/any script with a src attribute.
    
    tag = soup.find('script', src=True)
    tag = soup.select_one("script[src]")
    
    # find all tags with a name attribute beginning with foo
    # or any src beginning with /path
    soup.select("[name^=foo], [src^=/path]")
    
    # find all tags with a name attribute that contains foo
    # or any src containing with whatever
    soup.select("[name*=foo], [src*=whatever]")
    
    # find all tags with a name attribute that endwith foo
    # or any src that ends with  whatever
    soup.select("[name$=foo], [src$=whatever]")
    

    You can also use regular expressions with find or find_all:

    import re
    # starting with
    soup.find_all("script", src=re.compile("^whatever"))
    # contains
    soup.find_all("script", src=re.compile("whatever"))
    # ends with 
    soup.find_all("script", src=re.compile("whatever$"))
    
    0 讨论(0)
  • 2020-12-07 22:35

    By using the pprint module you can examine the contents of an element.

    from pprint import pprint
    
    pprint(vars(element))
    

    Using this on a bs4 element will print something similar to this:

    {'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
     'can_be_empty_element': False,
     'contents': [u'\n\t\t\t\tNESNA\n\t'],
     'hidden': False,
     'name': u'span',
     'namespace': None,
     'next_element': u'\n\t\t\t\tNESNA\n\t',
     'next_sibling': u'\n',
     'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
     'parser_class': <class 'bs4.BeautifulSoup'>,
     'prefix': None,
     'previous_element': u'\n',
     'previous_sibling': u'\n'}
    

    To access an attribute - lets say the class list - use the following:

    class_list = element.attrs.get('class', [])
    

    You can filter elements using this approach:

    for script in soup.find_all('script'):
        if script.attrs.get('for'):
            # ... Has 'for' attr
        elif "myClass" in script.attrs.get('class', []):
            # ... Has class "myClass"
        else: 
            # ... Do something else
    
    0 讨论(0)
  • 2020-12-07 22:37

    If you only need to get tag(s) with attribute(s), you can use lambda:

    soup = bs4.BeautifulSoup(YOUR_CONTENT)
    
    • Tags with attribute
    tags = soup.find_all(lambda tag: 'src' in tag.attrs)
    

    OR

    tags = soup.find_all(lambda tag: tag.has_attr('src'))
    
    • Specific tag with attribute
    tag = soup.find(lambda tag: tag.name == 'script' and 'src' in tag.attrs)
    
    • Etc ...

    Thought it might be useful.

    0 讨论(0)
  • 2020-12-07 22:40

    For future reference, has_key has been deprecated is beautifulsoup 4. Now you need to use has_attr

    scriptTags = outputDoc.findAll('script')
      for script in scriptTags:
        if script.has_attr('some_attribute'):
          do_something()  
    
    0 讨论(0)
  • 2020-12-07 22:42

    you can check if some attribute are present

    scriptTags = outputDoc.findAll('script', some_attribute=True)
    for script in scriptTags:
        do_something()
    
    0 讨论(0)
  • 2020-12-07 22:43

    If i understand well, you just want all the script tags, and then check for some attributes in them?

    scriptTags = outputDoc.findAll('script')
    for script in scriptTags:
        if script.has_attr('some_attribute'):
            do_something()        
    
    0 讨论(0)
提交回复
热议问题