Puppeteer | 易学教程

How to use xpath in chrome headless+puppeteer evaluate()?

阅读更多关于 How to use xpath in chrome headless+puppeteer evaluate()?

How can I use $x() to use xpath expression inside a page.evaluate() ? As far as page is not in the same context, I tried $x() directly (like I would do in chrome dev tools), but no cigar. The script goes in timeout. $x() is not a standard JavaScript method to select element by XPath. $x() it's only a helper in chrome devtools . They claim this in the documentation: Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page. And page.evaluate() is treated here as a "scripts on the page". You have two options: Use document

Wait for text to appear when using puppeteer

阅读更多关于 Wait for text to appear when using puppeteer

问题 I wonder if there's a similar way as in selenium to wait for text to appear for a particular element. I've tried something like this but it doesn't seem to wait: await page.waitForSelector('.count', {visible: true}); 回答1: You can use waitForFunction . See https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitforfunctionpagefunction-options-args Including @elena's solution for completeness of the answer: await page.waitForFunction('document.querySelector(".count").inner‌

Puppeteer: Click on element with text

阅读更多关于 Puppeteer: Click on element with text

Is there any method(didn't find in API) or solution to click on element with text? For example i have html: <div class="elements"> <button>Button text</button> <a href=#>Href text</a> <div>Div text</div> </div> And I want to click on element in which text is wrapped(Click on button inside .elements), like: Page.click('Button text', '.elements') Any solution? You may use a XPath selector with page.$x(expression) : const linkHandlers = await page.$x("//a[contains(text(), 'Some text')]"); if (linkHandlers.length > 0) { await linkHandlers[0].click(); } else { throw new Error("Link not found"); }

Headless browser detection

阅读更多关于 Headless browser detection

问题 Do you know any webapps/online tests/online firewalls that are trying to detect if user is using selenium/puppeteer/phantomJS or any other headless browser? I've created my puppeteer online crawler. I've changed many different stuff like window.navigator object (user-agent, ~.webdriver etc.). Now I want to make sure that it is undetectable. 回答1: There is a headless browser detection test which tests for the following: Does the User-Agent contain the string "HeadlessChrome"? Is navigator

Trying to hide first footer/header on PDF generated with Puppeteer

阅读更多关于 Trying to hide first footer/header on PDF generated with Puppeteer

问题 Im new using nodejs functions and also puppeteer. Previously I was using wkhtmltopdf but currently its options are very poor. So, my idea was generating a pdf from a html with a first cover page (an image with full A4 width/height ), since the footer is generated from the index.js, theres no way to hide it on the FIRST page of the PDF. //Imports const puppeteer = require('puppeteer'); //Open browser async function startBrowser() { const browser = await puppeteer.launch({headless: true, args:[

Puppeteer: Get inner HTML

阅读更多关于 Puppeteer: Get inner HTML

问题 does anybody know how to get the innerHTML or text of an element. Or even better; how to click an element with a specific innerHTML. This is how it would work with normal javascript: var found = false $(selector).each(function() { if (found) return; else if ($(this).text().replace(/[^0-9]/g, '') === '5' { $(this).trigger('click'); found = true } Thanks in advance for any help! 回答1: This is how i get innerHTML: page.$eval(selector, (element) => { return element.innerHTML }) 回答2: This should

Error while excuting chrome without headless on heroku

阅读更多关于 Error while excuting chrome without headless on heroku

问题 I am currently working on project where I need to build an application that needs to open an URL in a browser in order to use some functions on it. for that I used puppeteer inside a nodejs script in order to open the browser on the server side so I can use it like an api . Here's the code (nodejs): app.get('/do', (req, res) => { console.log("ok"); (async() => { var browser = await puppeteer.launch( { args: ['--no-sandbox','--disable-setuid-sandbox'], headless: false }); var page = await

Managing puppeteer for memory and performance

阅读更多关于 Managing puppeteer for memory and performance

问题 I'm using puppeteer for scraping some pages, but I'm curious about how to manage this in production for a node app. I'll be scraping up to 500,000 pages in a day, but these scrape jobs will happen at random intervals, so it's not a single queue that I can plow through. What I'm wondering is, is it better to open a browser, go to the page, then close the browser between each job? Which I would assume would be a lot slower, but maybe handle memory better? Or do I open one global browser when

Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout

阅读更多关于 Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout

I'm using puppeteer and jest to run some front end tests. My tests look as follows: describe("Profile Tab Exists and Clickable: /settings/user", () => { test(`Assert that you can click the profile tab`, async () => { await page.waitForSelector(PROFILE.TAB); await page.click(PROFILE.TAB); }, 30000); }); Sometimes, when I run the tests, everything works as expectedly. Other times, I get an error: Timeout - Async callback was not invoked within the 5000ms timeout specified by jest.setTimeout. at node_modules/jest-jasmine2/build/queue_runner.js:68:21 at Timeout.callback [as _onTimeout] (node

puppeteer: wait N seconds before continuing next line

阅读更多关于 puppeteer: wait N seconds before continuing next line

问题 in puppeteer I would like to wait a defined time before going to the next line of code. I've tried to put a setTimeout in an evaluate function but it seems to be simply ignored console.log('before waiting'); await page.evaluate(async() => { setTimeout(function(){ console.log('waiting'); }, 4000) }); console.log('after waiting'); This code don't wait and just write before waiting and after waiting Do you know how to do this? 回答1: You can use a little promise function, function delay(time) {