Extracting particular data with BeautifulSoup with span tags

梦想的初衷 提交于 2020-07-20 05:42:28

问题


I have this structure.

<div id="one" class="tab-pane active">
 <div class="item-content">
     <a href="/mobilni-uredjaji.4403.html">
         <div class="item-primary">
                 <div class="sticker-small">
                       <span class=""></span>
                  </div>
                  <div class="sticker-small-lte">
                      <span class="lte"></span>
                  </div>

                 <div class="item-photo">
                          <img src="/upload/images/thumbs/devices/SAMG935F/SAMG935F_image001_220x230.png" alt="Samsung Galaxy S7 edge">
                 </div>


                <div class="item-labels">
                   <span class="item-manufacturer">Samsung</span>
                   <span class="item-title">Galaxy S7 edge</span>
                </div>
      </div>

       <div class="item-secondary">
                <span class="item-price">94.000<span class="currency">currency!</span></span>
                <span class="item-package"> uz! Sigurica 1500</span>
                <span class="item-installments">na! 24 rate_po!</span>
                <span class="value">100 currency!</span>
               <span class="item-available">device_is_available_on_webshop_list!</span>
      </div>

    </a>
 </div>
 <div class="item-content">
     <a href="/mobilni-uredjaji.4403.html">
         <div class="item-primary">
                 <div class="sticker-small">
                       <span class=""></span>
                  </div>
                  <div class="sticker-small-lte">
                      <span class="lte"></span>
                  </div>

                 <div class="item-photo">
                          <img src="/upload/images/thumbs/devices/SAMG935F/SAMG935F_image001_220x230.png" alt="Samsung Galaxy S7 edge">
                 </div>


                <div class="item-labels">
                   <span class="item-manufacturer">Samsung</span>
                   <span class="item-title">Galaxy S7 edge</span>
                </div>
      </div>

       <div class="item-secondary">
                <span class="item-price">94.000<span class="currency">currency!</span></span>
                <span class="item-package"> uz! Sigurica 1500</span>
                <span class="item-installments">na! 24 rate_po!</span>
                <span class="value">100 currency!</span>
               <span class="item-available">device_is_available_on_webshop_list!</span>
      </div>

    </a>
 </div>
 
 -----same structure ----
 
</div> 

My target is div with class="item-labels" with corresponding data:

  • Samsung
  • Galaxy S7 edge

and div with class "item-secondary" with data under the span tag with class="item-price", 94.000.

I need my output to be exactly:

  • Samsung
  • Galaxy S7 edge
  • 94000

So far with this code i am getting first two data without price under the span. I am kinda stuck here because i haven't done scraping quite a while. Please hint? Code is:

from bs4 import BeautifulSoup
import re
import pymysql
import MySQLdb
import urllib

#rl = "http://www.vipmobile.rs/mobilni-uredjaji.2631.html#tarifgroup-1|devicetype-1|minprice-0|maxprice-124800|brand-0|model-0"
url = "file:///C:/Users/zika/Desktop/one.html"



html = urllib.urlopen(url)
page = html.read()
#print(page)
# db = MySQLdb.connect(host = 'localhost',
                     # user = 'root',
                     # passwd = '123456',
                     # db = 'lyrics')
soup = BeautifulSoup(page, 'html.parser')

#mobData = soup.find("div", {"class": "bxslider items"}).find_all("div", {"class": "item-content"})
#for mobMan in soup.find("div", {"class": "tab-pane active"}).findAll("span")

labelData =  soup.find("div", {"class": "tab-pane active"}).find_all("div", {"class": "item-content"})
labelPrice = soup.find("div", {"class": "tab-pane active"}).find_all("span", class_="item-price")

 
for label in labelData:
print(label.contents[1].find("div", {"class": "item-labels"}).getText())
	
for price in labelPrice:
    print(price.getText())


	
	
	


 
    	

    






		
		
		

input("\n\nPress the enter key to exit!")		

回答1:


You could try that:

from bs4 import BeautifulSoup

soup = BeautifulSoup(source, "html.parser")
div1 = soup.find("div", { "class" : "item-labels" }).findAll('span', { "class" : "item-manufacturer" })
div2 = soup.find("div", { "class" : "item-labels" }).findAll('span', { "class" : "item-title" })
div3 = soup.find("div", { "class" : "item-secondary" }).findAll('span', { "class" : "item-price" })
for i,j,k in zip(div1,div2,div3):
    print i.text
    print j.text
    print k.text.replace("currency!",'')

Note: as source in the above code I used the structure you provided in your post.

This will give the following output:

Samsung
Galaxy S7 edge
94.000


来源:https://stackoverflow.com/questions/40003342/extracting-particular-data-with-beautifulsoup-with-span-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!