I\'m using Puppeteer for Web Scraping and I have just noticed that sometimes, the website I\'m trying to scrape asks for a captcha due to the amount of visits I\'m doing fro
You should use combination of following:
Disclaimer: Do not use anti-captcha plugins/services to misuse resources. Resources are expensive.
Basically the idea is to use anti-captcha services like (2captcha) to deal with persisting recaptcha.
You can use this plugin called puppeteer-extra-plugin-recaptcha by berstend.
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add recaptcha plugin and provide it your 2captcha token
// 2captcha is the builtin solution provider but others work as well.
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
RecaptchaPlugin({
provider: { id: '2captcha', token: 'XXXXXXX' },
visualFeedback: true // colorize reCAPTCHAs (violet = detected, green = solved)
})
)
Afterwards you can run the browser as usual. It will pick up any captcha on the page and attempt to resolve it. You have to find the submit button which varies from site to site if it exists.
// puppeteer usage as normal
puppeteer.launch({ headless: true }).then(async browser => {
const page = await browser.newPage()
await page.goto('https://www.google.com/recaptcha/api2/demo')
// That's it, a single line of code to solve reCAPTCHAs