How to get text from span tag in BeautifulSoup

后端 未结 4 868
萌比男神i
萌比男神i 2020-11-30 12:30

I have links looks like this

相关标签:
4条回答
  • 2020-11-30 12:57

    You can simply use span tag in BeautifulSoup or you can include other attributes like class, title along with the span tag.

    from BeautifulSoup import BeautifulSoup as BSHTML
    
    htmlText = """<div class="systemRequirementsMainBox">
    <div class="systemRequirementsRamContent">
    <span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>"""
    
    soup = BSHTML(htmlText)
    spans = soup.findAll('span')
    # spans = soup.findAll('span', attrs = {'class' : 'your-class-name'}) # or span by class name
    # spans = soup.findAll('span', attrs = {'title' : '000 Plus Minimum RAM Requirement'}) # or span with a title
    for span in spans:
        print span.text
    
    0 讨论(0)
  • 2020-11-30 13:18

    You could solve this with just a couple lines of gazpacho:

    from gazpacho import Soup
    
    html = """\
    <div class="systemRequirementsMainBox">
    <div class="systemRequirementsRamContent">
    <span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>
    """
    
    soup = Soup(html)
    soup.find("span", {"title": "Minimum RAM Requirement"}).text
    # '1 GB'
    
    0 讨论(0)
  • 2020-11-30 13:20

    You can use a css selector, pulling the span you want using the title text :

    soup = BeautifulSoup("""<div class="systemRequirementsMainBox">
    <div class="systemRequirementsRamContent">
    <span title="000 Plus Minimum RAM Requirement">1 GB</span> </div>""", "xml")
    
    print(soup.select_one("span[title*=RAM]").text)
    

    That finds the span with a title attribute that contains RAM, it is equivalent to saying in python, if "RAM" in span["title"].

    Or using find with re.compile

    import re
    print(soup.find("span", title=re.compile("RAM")).text)
    

    To get all the data:

    from bs4 import BeautifulSoup 
    r  = requests.get("http://www.game-debate.com/games/index.php?g_id=21580&game=000%20Plus").content
    
    soup = BeautifulSoup(r,"lxml")
    cont = soup.select_one("div.systemRequirementsRamContent")
    ram = cont.select_one("span")
    print(ram["title"], ram.text)
    for span in soup.select("div.systemRequirementsSmallerBox.sysReqGameSmallBox span"):
            print(span["title"],span.text)
    

    Which will give you:

    000 Plus Minimum RAM Requirement 1 GB
    000 Plus Minimum Operating System Requirement Win Xp 32
    000 Plus Minimum Direct X Requirement DX 9
    000 Plus Minimum Hard Disk Drive Space Requirement 500 MB
    000 Plus GD Adjusted Operating System Requirement Win Xp 32
    000 Plus GD Adjusted Direct X Requirement DX 9
    000 Plus GD Adjusted Hard Disk Drive Space Requirement 500 MB
    000 Plus Recommended Operating System Requirement Win Xp 32
    000 Plus Recommended Hard Disk Drive Space Requirement 500 MB
    
    0 讨论(0)
  • 2020-11-30 13:23

    contents[0]' after iterating over all the tags in the folder.

    0 讨论(0)
提交回复
热议问题