How can I check if either xpath exists and then return the value if text is present?

问题

I'm having trouble with the second r.html.xpath request. When there is a special deal on an item, the second Xpath changes from

//*[@id="priceblock_ourprice"]

//*[@id="priceblock_dealprice"]

This causes the script to fail since there the right xpath cannot be returned. How can I include this second xpath that only shows up occasionally? I would like to see if either xpath exists, if so return that, or return N/A. The first url that is searched has the ourprice xpath and the second url has the dealprice xpath. What am I missing here?

from requests_html import HTMLSession
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W',
'http://amazon.com/dp/B089FBPFHS'
          ]

def getPrice(url):
    s = HTMLSession()
    r = s.get(url)
    r.html.render(sleep=1,timeout=20)
    product = {
        'title': str(r.html.xpath('//*[@id="productTitle"]', first=True).text),
        'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
        'details': str(r.html.xpath('//*[@id="detailBulletsWrapper_feature_div"]', first=True).text)
    }
    res = {}
    for key in list(product):
        res[key] = product[key].replace('\n',' ')

    print(res)
    return res

prices = []
for url in urls:
    prices.append(getPrice(url))


df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

traceback

  'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
AttributeError: 'NoneType' object has no attribute 'text'

回答1:

Why don't you use the try and except command to check if the value exists. You get the error because the value you are trying to get has no text in it.

I haven't got requests_html, but I will show the code using the selenium module.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep, strftime
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W', 'http://amazon.com/dp/B089FBPFHS']

webdriver = webdriver.Chrome()
old_price = ""


def getPrice(url):
    global old_price
    global webdriver

    webdriver.get(url)

    sleep(5)

    title = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[1]/div/h1/span").text

    try:
        old_price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[10]/div[1]/div/table/tbody/tr[1]/td[2]/span[1]").text
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        if old_price[1:] == price[1:]:
            deal_type = "normal"
        else:
            deal_type = "deal"
    
    except:
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        deal_type = "normal"
    
    print(old_price)
    print(title)
    print(price)
    print(deal_type)

    return price

prices = []

for url in urls:
    prices.append(getPrice(url))

print(prices)

df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

Let me explain:

The first 4 lines import the necessary modules such as selenium and pandas. The next line saves the URLs. After, webdriver = webdriver.Chrome() sets the brower to chrome.

After, in getPrice, we open the url using webdriver.get(url).

Then, we get the title from the xpath variable.

The try command checks to see if the xpath which shows the deal exists. if it does, it gets the old and new price, and saves the product as a deal. If the xpath for a deal does NOT exist, it moves onto the except and saves the prodcut as a normal one.

It then prints the price, title and deal type.

Finally, it runs the function for every URL, and saves it to a CSV file.

I hope this helps your problem. I explained the code so that you could turn it into requests_html.

回答2:

if r.html.xpath('//*[boolean(@id="priceblock_ourprice"):
    productprice = str(r.html.xpath('//*[boolean(@id="priceblock_ourprice")]', first=True).text)
elif r.html.xpath('//*[boolean(@id="priceblock_dealprice"):
    productprice = str(r.html.xpath('//*[boolean(@id="priceblock_dealprice")]', first=True).text)      

product = {
        'title': str(r.html.xpath('//*[@id="productTitle"]', first=True).text),
        'price': productprice,
        'details': str(r.html.xpath('//*[@id="detailBulletsWrapper_feature_div"]', first=True).text)
    }

Something like that. I am not exactly sure if the syntax is totally correct.

来源：https://stackoverflow.com/questions/65313719/how-can-i-check-if-either-xpath-exists-and-then-return-the-value-if-text-is-pres

标签

python

pandas

xpath

web-scraping

python-requests