Puppeteer

How to use xpath in chrome headless+puppeteer evaluate()?

北慕城南 提交于 2019-11-28 10:33:15
How can I use $x() to use xpath expression inside a page.evaluate() ? As far as page is not in the same context, I tried $x() directly (like I would do in chrome dev tools), but no cigar. The script goes in timeout. $x() is not a standard JavaScript method to select element by XPath. $x() it's only a helper in chrome devtools . They claim this in the documentation: Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page. And page.evaluate() is treated here as a "scripts on the page". You have two options: Use document

Wait for text to appear when using puppeteer

我的梦境 提交于 2019-11-28 07:22:46
问题 I wonder if there's a similar way as in selenium to wait for text to appear for a particular element. I've tried something like this but it doesn't seem to wait: await page.waitForSelector('.count', {visible: true}); 回答1: You can use waitForFunction . See https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitforfunctionpagefunction-options-args Including @elena's solution for completeness of the answer: await page.waitForFunction('document.querySelector(".count").inner‌

Puppeteer: Click on element with text

馋奶兔 提交于 2019-11-28 06:10:16
Is there any method(didn't find in API) or solution to click on element with text? For example i have html: <div class="elements"> <button>Button text</button> <a href=#>Href text</a> <div>Div text</div> </div> And I want to click on element in which text is wrapped(Click on button inside .elements), like: Page.click('Button text', '.elements') Any solution? You may use a XPath selector with page.$x(expression) : const linkHandlers = await page.$x("//a[contains(text(), 'Some text')]"); if (linkHandlers.length > 0) { await linkHandlers[0].click(); } else { throw new Error("Link not found"); }

Headless browser detection

混江龙づ霸主 提交于 2019-11-28 06:08:27
问题 Do you know any webapps/online tests/online firewalls that are trying to detect if user is using selenium/puppeteer/phantomJS or any other headless browser? I've created my puppeteer online crawler. I've changed many different stuff like window.navigator object (user-agent, ~.webdriver etc.). Now I want to make sure that it is undetectable. 回答1: There is a headless browser detection test which tests for the following: Does the User-Agent contain the string "HeadlessChrome"? Is navigator

Trying to hide first footer/header on PDF generated with Puppeteer

六月ゝ 毕业季﹏ 提交于 2019-11-28 04:39:19
问题 Im new using nodejs functions and also puppeteer. Previously I was using wkhtmltopdf but currently its options are very poor. So, my idea was generating a pdf from a html with a first cover page (an image with full A4 width/height ), since the footer is generated from the index.js, theres no way to hide it on the FIRST page of the PDF. //Imports const puppeteer = require('puppeteer'); //Open browser async function startBrowser() { const browser = await puppeteer.launch({headless: true, args:[

Puppeteer: Get inner HTML

那年仲夏 提交于 2019-11-28 02:44:17
问题 does anybody know how to get the innerHTML or text of an element. Or even better; how to click an element with a specific innerHTML. This is how it would work with normal javascript: var found = false $(selector).each(function() { if (found) return; else if ($(this).text().replace(/[^0-9]/g, '') === '5' { $(this).trigger('click'); found = true } Thanks in advance for any help! 回答1: This is how i get innerHTML: page.$eval(selector, (element) => { return element.innerHTML }) 回答2: This should

Error while excuting chrome without headless on heroku

大兔子大兔子 提交于 2019-11-27 22:49:02
问题 I am currently working on project where I need to build an application that needs to open an URL in a browser in order to use some functions on it. for that I used puppeteer inside a nodejs script in order to open the browser on the server side so I can use it like an api . Here's the code (nodejs): app.get('/do', (req, res) => { console.log("ok"); (async() => { var browser = await puppeteer.launch( { args: ['--no-sandbox','--disable-setuid-sandbox'], headless: false }); var page = await

Managing puppeteer for memory and performance

巧了我就是萌 提交于 2019-11-27 22:23:58
问题 I'm using puppeteer for scraping some pages, but I'm curious about how to manage this in production for a node app. I'll be scraping up to 500,000 pages in a day, but these scrape jobs will happen at random intervals, so it's not a single queue that I can plow through. What I'm wondering is, is it better to open a browser, go to the page, then close the browser between each job? Which I would assume would be a lot slower, but maybe handle memory better? Or do I open one global browser when

Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout

China☆狼群 提交于 2019-11-27 19:56:42
I'm using puppeteer and jest to run some front end tests. My tests look as follows: describe("Profile Tab Exists and Clickable: /settings/user", () => { test(`Assert that you can click the profile tab`, async () => { await page.waitForSelector(PROFILE.TAB); await page.click(PROFILE.TAB); }, 30000); }); Sometimes, when I run the tests, everything works as expectedly. Other times, I get an error: Timeout - Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout. at node_modules/jest-jasmine2/build/queue_runner.js:68:21 at Timeout.callback [as _onTimeout] (node

puppeteer: wait N seconds before continuing next line

房东的猫 提交于 2019-11-27 17:10:11
问题 in puppeteer I would like to wait a defined time before going to the next line of code. I've tried to put a setTimeout in an evaluate function but it seems to be simply ignored console.log('before waiting'); await page.evaluate(async() => { setTimeout(function(){ console.log('waiting'); }, 4000) }); console.log('after waiting'); This code don't wait and just write before waiting and after waiting Do you know how to do this? 回答1: You can use a little promise function, function delay(time) {