web-scraping | 易学教程

Automation of iTunes connect VBA

阅读更多关于 Automation of iTunes connect VBA

问题 I am trying to automate a report through VBA. I have worked in VBA but not able to login in iTunes website through codes. Someone told me that it is written in IFrame, but i have no idea. Even i am not able to put my username in input box of login page. https://itunesconnect.apple.com/login Dim HTMLdoc As HTMLDocument Dim MyBrowser As InternetExplorer Sub check() Dim MyHTML_element As IHTMLElement Dim MyURL As String MyURL = "https://itunesconnect.apple.com/login" Set MyBrowser = New

StaleElementReferenceException even after adding the wait while collecting the data from the wikipedia using web-scraping

阅读更多关于 StaleElementReferenceException even after adding the wait while collecting the data from the wikipedia using web-scraping

问题 I am a newbie to the web-scraping. Pardon my silly mistakes if there are any. I have been working on a project in which I need a list of movies as my data. I am trying to collect the data from the wikipedia using web-scraping. Following is my code for the same: def MoviesList(years, driver): for year in years: driver.implicitly_wait(150) year.click() table = driver.find_element_by_xpath('/html/body/div[3]/div[3]/div[5]/div[1]/table[2]/tbody') movies = table.find_elements_by_xpath('tr/td[1]/i

WebScraping JavaScript-Rendered Content using Selenium in Python

阅读更多关于 WebScraping JavaScript-Rendered Content using Selenium in Python

问题 I am very new to web scraping and have been trying to use Selenium's functions to simulate a browser accessing the Texas public contracting webpage and then download embedded PDFs. The website is this: http://www.txsmartbuy.com/sp. So far, I've successfully used Selenium to select an option in one of the dropdown menus "Agency Name" and to click the search button. I've listed my Python code below. import os os.chdir("/Users/fsouza/Desktop") #Setting up directory from bs4 import BeautifulSoup

WebScraping JavaScript-Rendered Content using Selenium in Python

阅读更多关于 WebScraping JavaScript-Rendered Content using Selenium in Python

How to unit test a web scraping service php unit

阅读更多关于 How to unit test a web scraping service php unit

问题 I am currently developing a project in PHP + Laravel that needs to scrape data from two different websites. I am using the Goutte Scraping Library. I have 10 integration tests, where I use the Crawler object that Goutte's Client provide in order to get the specific data I want to scrape from each website. The tests work just fine (I even used infection library for mutant testing)... But the thing is that I thik there could be a way to unit test all the functions (therefore, the tests would

How to unit test a web scraping service php unit

阅读更多关于 How to unit test a web scraping service php unit

How to scrape links from a webpage using javascript?

阅读更多关于 How to scrape links from a webpage using javascript?

问题 I'm looking to scrape the links of post shown on facebook feed. I noticed that post link has two things in common it has https://www.facebook.com/username/posts/1234567890 https://www.facebook.com/ and /posts/ is always there. I used this code to get all links on the page but I don't know how to only grab links with https://www.facebook.com/ and /posts/ in this. var links = document.querySelectorAll("a[href^='https://www.facebook.com']"); for(var i = 0; i< links.length; i++){ console.log

How to scrape links from a webpage using javascript?

阅读更多关于 How to scrape links from a webpage using javascript?

I cannot autologin to pastebin using requests + BeautifulSoup

阅读更多关于 I cannot autologin to pastebin using requests + BeautifulSoup

问题 I am trying to auto-login to pastebin account using python, but im failing and i don't know why. I copied the request headers exactly and double checked... but still i am greeted with 400 HTTP code. Can somebody help me? This is my code: import requests from bs4 import BeautifulSoup import subprocess import os import sys from requests import Session # the actual program page = requests.get("https://pastebin.com/99qQTecB") parse = BeautifulSoup(page.content, 'html.parser') string = parse.find(

if or try loop for an element in a page selenium

阅读更多关于 if or try loop for an element in a page selenium

问题 I am trying to scrape agents data here. I am able to get the links from the first page. I am using numbered loops because I know the total number of pages. I tried to run this as long as the "next" page option is there. I tried both "try" and "if not" but wasn't able to figure it out. Any help is welcome. Here is the code. from selenium import webdriver import time from selenium.common.exceptions import ElementNotVisibleException, NoSuchElementException from selenium.webdriver.common.by