splash-js-render

Get content inside of script tag

做~自己de王妃 提交于 2021-02-19 03:57:22
问题 Hello everyone I'm trying to fetch content inside of script tag. http://www.teknosa.com/urunler/145051447/samsung-hm1500-bluetooth-kulaklik this is the website. Also this is script tag which I want to enter inside. $.Teknosa.ProductDetail = {"ProductComputedIndex":145051447,"ProductName":"SAMSUNG HM1500 BLUETOOTH KULAKLIK","ProductSeoName":"samsung-hm1500-bluetooth-kulaklik","ProductBarcode":"8808993790425","ProductPriceInclTax":79.9,"ProductDiscountedPriceInclTax":null,"ProductStockQuantity"

Scrapy Splash click button doesn't work

喜欢而已 提交于 2021-02-07 09:10:48
问题 What I'm trying to do On avito.ru (Russian real estate site), person's phone is hidden until you click on it. I want to collect the phone using Scrapy+Splash. Example URL: https://www.avito.ru/moskva/kvartiry/2-k_kvartira_84_m_412_et._992361048 After you click the button, pop-up is displayed and phone is visible. I'm using Splash execute API with following Lua script: function main(splash) splash:go(splash.args.url) splash:wait(10) splash:runjs("document.getElementsByClassName('item-phone

Scrapy CrawlSpider + Splash: how to follow links through linkextractor?

青春壹個敷衍的年華 提交于 2020-01-12 07:42:04
问题 I have the following code that is partially working, class ThreadSpider(CrawlSpider): name = 'thread' allowed_domains = ['bbs.example.com'] start_urls = ['http://bbs.example.com/diy'] rules = ( Rule(LinkExtractor( allow=(), restrict_xpaths=("//a[contains(text(), 'Next Page')]") ), callback='parse_item', process_request='start_requests', follow=True), ) def start_requests(self): for url in self.start_urls: yield SplashRequest(url, self.parse_item, args={'wait': 0.5}) def parse_item(self,

Using docker, scrapy splash on Heroku

对着背影说爱祢 提交于 2020-01-10 15:39:33
问题 I have a scrapy spider that uses splash which runs on Docker localhost:8050 to render javascript before scraping. I am trying to run this on heroku but have no idea how to configure heroku to start docker to run splash before running my web: scrapy crawl abc dyno. Any guides is greatly appreciated! 回答1: From what I gather you're expecting: Splash instance running on Heroku via Docker container Your web application (Scrapy spider) running in a Heroku dyno Splash instance Ensure you can have