Scrapy does not find text in Xpath or Css

问题

I've been at this one for a few days, and no matter how I try, I cannot get scrapy to abstract text that is in one element.

to spare you all the code, here are the important pieces. The setup does grab everything else off the page, just not this text.

from scrapy.selector import Selector
start_url = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"

#BASIC ITEM AND SPIDER YADA, SPARE YOU THE DETAILS

hxs = Selector(response)
response_css = response.css("body")

desc_data = hxs.xpath('//*[@id="DETAILS_TRUNC_TEXT"]//text()').extract()
desc_data2 = response_css.css('#DETAILS_TRUNC_TEXT::text').extract()

both return empty lists. Yes, I found the xpath and css selector via chrome, but the rest of them work just fine as I'm able to find other data on the site. Please help me find out why this isn't working.

回答1:

To get the data you need to use any browser simulator like selenium so that It can catch the response of dynamically generated content. You need to put some delay to let the webpage load it's content fully. This is how you can go:

from selenium import webdriver
from scrapy import Selector
import time

driver = webdriver.Chrome()
URL = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"
driver.get(URL)

time.sleep(5) #If you take out this line you won't get anything because the content of that page take some time to get loaded.

sel = Selector(text=driver.page_source)
item = sel.css('#DETAILS_TRUNC_TEXT::text').extract() #It is working
item_ano = sel.xpath('//*[@id="DETAILS_TRUNC_TEXT"]//text()').extract() #It is also working
print(item, item_ano)
driver.quit()