Scrapy does not find text in Xpath or Css

我与影子孤独终老i 提交于 2021-01-07 02:16:15

问题


I've been at this one for a few days, and no matter how I try, I cannot get scrapy to abstract text that is in one element.

to spare you all the code, here are the important pieces. The setup does grab everything else off the page, just not this text.

from scrapy.selector import Selector
start_url = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"

#BASIC ITEM AND SPIDER YADA, SPARE YOU THE DETAILS

hxs = Selector(response)
response_css = response.css("body")

desc_data = hxs.xpath('//*[@id="DETAILS_TRUNC_TEXT"]//text()').extract()
desc_data2 = response_css.css('#DETAILS_TRUNC_TEXT::text').extract()

both return empty lists. Yes, I found the xpath and css selector via chrome, but the rest of them work just fine as I'm able to find other data on the site. Please help me find out why this isn't working.


回答1:


To get the data you need to use any browser simulator like selenium so that It can catch the response of dynamically generated content. You need to put some delay to let the webpage load it's content fully. This is how you can go:

from selenium import webdriver
from scrapy import Selector
import time

driver = webdriver.Chrome()
URL = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"
driver.get(URL)

time.sleep(5) #If you take out this line you won't get anything because the content of that page take some time to get loaded.

sel = Selector(text=driver.page_source)
item = sel.css('#DETAILS_TRUNC_TEXT::text').extract() #It is working
item_ano = sel.xpath('//*[@id="DETAILS_TRUNC_TEXT"]//text()').extract() #It is also working
print(item, item_ano)
driver.quit()



回答2:


I tried your xpath and css in scrapy shell, and got nothing also.

Then I used view(response) command and found out the site is dynamic.

Here is a screenshot:

You can see that the details under Overview doesn't show up, and that's why no matter how you try, you still got nothing.

Solutions: Try Selenium (check the solution that SIM provided in the last answer) or Splash.

Good Luck. :)



来源:https://stackoverflow.com/questions/48756707/scrapy-does-not-find-text-in-xpath-or-css

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!