web-scraping

Find on beautiful soup in loop returns TypeError

不打扰是莪最后的温柔 提交于 2021-01-27 18:31:51
问题 I'm trying to scrape a table on an ajax page with Beautiful Soup and print it out in table form with the TextTable library. import BeautifulSoup import urllib import urllib2 import getpass import cookielib import texttable cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) urllib2.install_opener(opener) ... def show_queue(): url = 'https://www.animenfo.com/radio/nowplaying.php' values = {'ajax' : 'true', 'mod' : 'queue'} data = urllib.urlencode(values) f

python - HTTP Error 503 Service Unavailable

半城伤御伤魂 提交于 2021-01-27 18:28:47
问题 I am trying to scrape data from google and linkedin. Somehow it gave me this error: *** httperror_seek_wrapper: HTTP Error 503: Service Unavailable Can someone help advice how I solve this? 回答1: Google is simply detecting your query as automated. You would need a captcha solver to get unlimited results. The following link might be helpful. https://support.google.com/websearch/answer/86640?hl=en Bypassing Captcha using an OCR Engine: http://www.debasish.in/2012/01/bypass-captcha-using-python

Scrapy Pipeline doesn't insert into MySQL

巧了我就是萌 提交于 2021-01-27 17:55:39
问题 I'm trying to build a small app for a university project with Scrapy. The spider is scraping the items, but my pipeline is not inserting data into mysql database. In order to test whether the pipeline is not working or the pymysl implementation is not working I wrote a test script: Code Start #!/usr/bin/python3 import pymysql str1 = "hey" str2 = "there" str3 = "little" str4 = "script" db = pymysql.connect("localhost","root","**********","stromtarife" ) cursor = db.cursor() cursor.execute(

Clicking links with Python BeautifulSoup

六眼飞鱼酱① 提交于 2021-01-27 17:47:44
问题 So I'm new to Python (I come from a PHP/JavaScript background), but I just wanted to write a quick script that crawled a website and all children pages to find all a tags with href attributes, count how many there are and then click the link. I can count all of the links, but I can't figure out how to "click" the links and then return the response codes. from bs4 import BeautifulSoup import urllib2 import re def getLinks(url): html_page = urllib2.urlopen(url) soup = BeautifulSoup(html_page,

Excel VBA - Web Scraping - Get value in HTML Table cell

荒凉一梦 提交于 2021-01-27 16:45:25
问题 I am trying to create a macro that scrapes a cargo tracking website. But I have to create 4 such macros as each airline has a different website. I am new to VBA and web scraping. I have put together a code that works for 1 website. But when I tried to replicate it for another one, I am stuck in the loop. I think it maybe how I am referring to the element, but like I said, I am new to VBA and have no clue about HTML. I am trying to get the "notified" value in the highlighted line from the

How can I loop over pages and get data from every page with selenium?

a 夏天 提交于 2021-01-27 14:32:15
问题 I want to do a google search and collect the links to all hits so that I can click those links and extract data from them after collecting all links. How can I get the link from every hit? I've tried several solutions like using a for loop and a while True statement. I'll show some examples of the code below. I either get no data at all or I get only data (links) from 1 webpage. Can someone please help me figure out how to iterate over every page of the google search and get all the links so

How to click a button on a website using Puppeteer without any class, id ,… assigned to it?

对着背影说爱祢 提交于 2021-01-27 14:31:03
问题 So I want to click on a button on a website. The button has no id, class,... So I should find a way to click the button with the name that's on it. In this example I should click by the name "Supreme®/The North Face® Leather Shoulder Bag" This is my code in Node.js const puppeteer = require('puppeteer'); let scrape = async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('https://www.supremenewyork.com/shop/all/bags');

Vba - webscraping using ng-click

故事扮演 提交于 2021-01-27 14:23:25
问题 I am using Selenium and I would like to be able to click on the following <a ng-click="download()">download</a>' This is an 'a' tag. I am not sure how the code would be like to click onto an 'a' tag that has got ng-click in it. Dim d As WebDriver Set d = New ChromeDriver Const URL = "url of the website - not public" With d .Start "Chrome" .get URL .Window.Maximize .FindElementById("Search").SendKeys "information to search" .Wait 1000 .FindElementById("Submit").Click .Wait 1000 'then I need to

Requests.get showing different HTML than Chrome's Developer Tool

我是研究僧i 提交于 2021-01-27 13:15:38
问题 I am working on a web scraping tool using python (specifically jupyter notebook) that scrapes a few real estate pages and saves the data like price, adress etc. It is working just fine for one of the pages I picked out but when I try to scrape this page: sreality.cz (sorry, the page is in Czech but the actual content is not that important now) using reguests.get() I get this result: <!doctype html> <html lang="{{ html.lang }}" ng-app="sreality" ng-controller="MainCtrl"> <head> <meta charset=

Why does parsing XML document using MSXML v3.0 work, but MSXML v6.0 doesn't

徘徊边缘 提交于 2021-01-27 12:51:09
问题 So, I am working on a project that scrapes and collects data from many different sources around the internet with many different methods depending on each source's characteristics. The most recent addition is a web API call which returns the following XML as a response: <?xml version="1.0"?> <Publication_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-3:publicationdocument:7:0"> <mRID>29b526a69b9445a7bb507ba446e3e8f9</mRID> <revisionNumber>1</revisionNumber> <type>A44</type> <sender