web-scraping

Getting Request failed with status code 403 with axios get

為{幸葍}努か 提交于 2021-01-25 07:09:54
问题 I have setup my axios like this: const agent = new https.Agent({ rejectUnauthorized: false }); and sending a get call like this: let data = await axios.get('https://www.skechers.com/en-us/', { httpsAgent: agent }); but with some urls my request fails with this error: Request failed with status code 403 what would be the possible reason to cause this error. I have tried setting up headers as follow but still getting the error let data = await axios.get(url, { httpsAgent: agent, headers: {

Getting Request failed with status code 403 with axios get

坚强是说给别人听的谎言 提交于 2021-01-25 07:09:44
问题 I have setup my axios like this: const agent = new https.Agent({ rejectUnauthorized: false }); and sending a get call like this: let data = await axios.get('https://www.skechers.com/en-us/', { httpsAgent: agent }); but with some urls my request fails with this error: Request failed with status code 403 what would be the possible reason to cause this error. I have tried setting up headers as follow but still getting the error let data = await axios.get(url, { httpsAgent: agent, headers: {

Getting Request failed with status code 403 with axios get

你。 提交于 2021-01-25 07:09:05
问题 I have setup my axios like this: const agent = new https.Agent({ rejectUnauthorized: false }); and sending a get call like this: let data = await axios.get('https://www.skechers.com/en-us/', { httpsAgent: agent }); but with some urls my request fails with this error: Request failed with status code 403 what would be the possible reason to cause this error. I have tried setting up headers as follow but still getting the error let data = await axios.get(url, { httpsAgent: agent, headers: {

Puppeteer does not activate button click, despite selecting button

折月煮酒 提交于 2021-01-25 07:00:24
问题 I'm trying to automate a sign in to a simple website that a scammer sent my friend. I can use puppeteer to fill in the text inputs but when I try to use it to click the button, all it does is activate the button color change (that happens when the mouse hovers over the button). I also tried clicking enter while focusing on the input fields, but that doesn't seem to work. When I use document.buttonNode.click() in the console, it worked, but I can't seem to emulate that with puppeteer I also

Puppeteer does not activate button click, despite selecting button

北战南征 提交于 2021-01-25 07:00:23
问题 I'm trying to automate a sign in to a simple website that a scammer sent my friend. I can use puppeteer to fill in the text inputs but when I try to use it to click the button, all it does is activate the button color change (that happens when the mouse hovers over the button). I also tried clicking enter while focusing on the input fields, but that doesn't seem to work. When I use document.buttonNode.click() in the console, it worked, but I can't seem to emulate that with puppeteer I also

Scrapy - Set TCP Connect Timeout

若如初见. 提交于 2021-01-24 08:46:51
问题 I'm trying to scrape a website via Scrapy. However, the website is extremely slow at times and it takes almost 15-20 seconds to respond at first request in browser. Anyways, sometimes, when I try to crawl the website using Scrapy, I keep getting TCP Timeout error. Even though the website opens just fine on my browser. Here's the message: 2017-09-05 17:34:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://www.hosane.com/result/spec ialList> (failed 16 times): TCP

Retrieve citations of a journal paper using R

一笑奈何 提交于 2021-01-24 07:15:20
问题 Using R, I want to obtain the list of articles referencing to a scientific journal paper. The only information I have is the title of the article, e.g. "Protein measurement with the folin phenol reagent". Is anyone able to help me by producing a replicable example that I can use? Here is what I tried so far. The R package fulltext seems to be useful, because it allows to retrieve a list of IDs linked to an article. For instance, I can get the article's DOI: library(fulltext) res1 <- ft_search

Retrieve citations of a journal paper using R

允我心安 提交于 2021-01-24 07:14:50
问题 Using R, I want to obtain the list of articles referencing to a scientific journal paper. The only information I have is the title of the article, e.g. "Protein measurement with the folin phenol reagent". Is anyone able to help me by producing a replicable example that I can use? Here is what I tried so far. The R package fulltext seems to be useful, because it allows to retrieve a list of IDs linked to an article. For instance, I can get the article's DOI: library(fulltext) res1 <- ft_search

Scraping string from a large number of URLs with Julia

╄→尐↘猪︶ㄣ 提交于 2021-01-24 06:58:50
问题 Happy New Year! I have just started to learn Julia and my first mini challenge I have set myself is to scrape data from a large list of URLs. I have ca 50k URLs (which I successfully parsed from a JSON with Julia using Regex) in a CSV file. I want to scrape each one and return a matched string ("/page/12345/view" - where 12345 is any integer). I managed to do so using HTTP and Queryverse (although had started with CSV and CSVFiles but looking at packages for learning purposes) but the script