scrape | 易学教程

How would I scrape this JSON info using PHP and MySQL?

阅读更多关于 How would I scrape this JSON info using PHP and MySQL?

问题 Here's the info I'm trying to break up into a database. I'm going to be using this only for my own use to analyse statistics and all that. I have been manually doing it with Excel but I'd like to save myself some work in future. URL IS: http://fantasy.premierleague.com/web/api/elements/537/ Any idea how to scrape that info or easily convert it to excel format? I know a bit of php and mysql, but nothing about JSON and very little about scraping (I tried messing with SIMPLE_HTML_DOM). 回答1: You

Get data between two tags in Python

阅读更多关于 Get data between two tags in Python

问题 <h3> <a href="article.jsp?tp=&arnumber=16"> Granular computing based <span class="snippet">data</span> <span class="snippet">mining</span> in the views of rough set and fuzzy set </a> </h3> Using Python I want to get the values from the anchor tag which should be Granular computing based data mining in the views of rough set and fuzzy set I tried using lxml parser = etree.HTMLParser() tree = etree.parse(StringIO.StringIO(html), parser) xpath1 = "//h3/a/child::text() | //h3/a/span/child::text(

Scrapy Crawl all websites in start_url even if redirect

阅读更多关于 Scrapy Crawl all websites in start_url even if redirect

问题 I am trying to crawl a long list of websites. Some of the websites in the start_url list redirect (301). I want scrapy to crawl the redirected websites from start_url list as if they were also on the allowed_domain list (which they are not). For example, example.com was on my start_url list and allowed domain list and example.com redirects to foo.com. I want to crawl foo.com. DEBUG: Redirecting (301) to <GET http://www.foo.com/> from <GET http://www.example.com> I tried dynamically adding

UDP Tracker Scraping 1 script working other Not

阅读更多关于 UDP Tracker Scraping 1 script working other Not

问题 While using this script my tracker only update seeds & leechers from http tracker only 1st Tracker of my torrent. print("<tr><td class='desc'><b>" .T_("Torrent Stats"). ": </b></td><td valign='top' class='lista'>"); $seeders1 = $leechers1 = $downloaded1 = null; $tres = SQL_Query_exec("SELECT url FROM announce WHERE torrent=$id"); while ($trow = mysql_fetch_assoc($tres)) { $ann = $trow["url"]; $tracker = explode("/", $ann); $path = array_pop($tracker); $oldpath = $path; $path = preg_replace("/

how to scrape multiple pages from one site

阅读更多关于 how to scrape multiple pages from one site

问题 I want to scrap multiple pages from one site.the pattern like this： https://www.example.com/S1-3-1.html https://www.example.com/S1-3-2.html https://www.example.com/S1-3-3.html https://www.example.com/S1-3-4.html https://www.example.com/S1-3-5.html. I tried three method to scrape all of these pages once, but every method only scrape the first page. I show the code below, and anyone can check and tell me what is the problem will be highly appreciated. ===============method 1====================

VBA scrape src instead of href

阅读更多关于 VBA scrape src instead of href

问题 I am using the code below code but it brings the value of 'src' instead of 'href' for some reason. Anyone can help please? Sub bringfox(txt As String) Dim oHtml As HTMLDocument Dim oElement As Object Set oHtml = New HTMLDocument maintext2 = "https://www.jjfox.co.uk/cigars/show/all.html" With CreateObject("WINHTTP.WinHTTPRequest.5.1") .Open "GET", maintext2 & gr, False .send oHtml.body.innerHTML = .responseText End With counter = cnt 'oElement(i).Children(0).getAttribute ("href") Set oElement

How to scrape data using Ruby which is generated by a Javascript function?

阅读更多关于 How to scrape data using Ruby which is generated by a Javascript function?

问题 I am trying to scrape the data url link from the latest date (first row of the table) from this page. But it seems like the content of the table is generated by a Javascript function. I tried using Nokogiri to get it but in vain as nokogiri can not scrape Javascript. Then, I tried to get the script part only using Nokogiri by using: url = "http://www.sgx.com/wps/portal/sgxweb/home/marketinfo/historical_data/derivatives/daily_data" doc = Nokogiri::HTML(open(url)) js = doc.css("script").text

Scrape a particular area of site content With a Secure Login

阅读更多关于 Scrape a particular area of site content With a Secure Login

问题 I am trying to scrape some particular text of a website which is login secured here is the tutorial on this using curl http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/ But I am unable to implement this into my curl codes here is my curl script $url = "http://aftabcurrency.com/login_script.php"; $ch = curl_init(); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_URL, $url); $cookie = 'cookies.txt'; $timeout = 30; curl_setopt ($ch, CURLOPT

How to scrape all possible results from a search bar of a website

阅读更多关于 How to scrape all possible results from a search bar of a website

问题 This is my first web scraping task. I have been tasked with scraping this website It is a site that contains the names of lawyers in Denmark. My difficulty is that I can only retrieve names based on the particular name query i put in the search bar. Is there an online web tool I can use to scrape all the names that the website contains? I have used tools like Import.io with no success so far. I am super confused on how all of this works. 回答1: Please scroll down to UPDATE 2 The website

ElementNotVisibleException: Message: element not visible - Python3 Selenium

阅读更多关于 ElementNotVisibleException: Message: element not visible - Python3 Selenium

问题 I have been tasked with writing a parser to click a href link, that looks like a button, on a website and I am having some issues. Here's the html: https://pastebin.com/HDKLXpdJ Here's the source html: https://pastebin.com/PgT91kJs Python code: browser = webdriver.Chrome() ... try: element = WebDriverWait(browser, 20).until( EC.presence_of_element_located((By.ID, "reply-panel-reveal-btn"))) finally: elem = browser.find_element_by_xpath("//A[@id='reply-panel-reveal-btn']").click() I am getting