goutte

How to use Goutte

限于喜欢 提交于 2019-12-03 17:28:51
问题 Issue : Cannot fully understand the Goutte web scraper. Request : Can someone please help me understand or provide code to help me better understand how to use Goutte the web scraper? I have read over the README.md. I am looking for more information than what that provides such as what options are available in Goutte and how to write those options or when you are looking at forms do you search for the name= or the id= of the form? Webpage Layout attempting to be scraped : Step 1 : The webpage

How to use Goutte

萝らか妹 提交于 2019-12-03 06:11:59
Issue : Cannot fully understand the Goutte web scraper. Request : Can someone please help me understand or provide code to help me better understand how to use Goutte the web scraper? I have read over the README.md. I am looking for more information than what that provides such as what options are available in Goutte and how to write those options or when you are looking at forms do you search for the name= or the id= of the form? Webpage Layout attempting to be scraped : Step 1 : The webpage has a form has a radio button to choose what kind of form to fill out (ie. Name or License). It is

How to crawl with php Goutte and Guzzle if data is loaded by Javascript?

余生颓废 提交于 2019-11-30 20:35:00
Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery) You want to have a look at phantomjs. There is this php implementation: http://jonnnnyw.github.io/php-phantomjs/ if you need to have it working with php of course. You could read the page and then feed the contents to Guzzle, in order to use the nice functions that Guzzle gives you (like search for contents, etc...). That would depend on your needs, maybe you can simply use the dom, like this: How to get

Mink/Goutte How to check checkbox without attribute in Goutte?

冷暖自知 提交于 2019-11-29 17:24:30
I apologize in advance but I am very beginner. I try to check checkbox without id or name. <span class="ps-align-left"> <input type="checkbox" value="43899" style="background-color: rgb(252, 252, 252);"/> 43899 </span> I figure out how to do it with selenium2driver. So I use function "find" like this: public function checkOption() { $this->getSession()->getPage()->find('css', '.ps-align-left>input')->check(); } And it works fine but when I try to run test with headless browser Goutte I get error: /usr/bin/php5.6 /tmp/ide-behat.php --format PhpStormBehatFormatter /home/grzegorz/PhpstormProjects

Can Goutte/Guzzle be forced into UTF-8 mode?

隐身守侯 提交于 2019-11-29 00:13:10
I'm scraping from a UTF-8 site, using Goutte , which internally uses Guzzle. The site declares a meta tag of UTF-8, thus: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> However, the content type header is thus: Content-Type: text/html and not: Content-Type: text/html; charset=utf-8 Thus, when I scrape, Goutte does not spot that it is UTF-8, and grabs data incorrectly. The remote site is not under my control, so I can't fix the problem there! Here's a set of scripts to replicate the problem. First, the scraper: <?php require_once realpath(__DIR__ . '/..') . '/vendor/goutte

How can I scrape website content in PHP from a website that requires a cookie login?

梦想的初衷 提交于 2019-11-26 20:27:15
My problem is that it doesn't just require a basic cookie, but rather asks for a session cookie, and for randomly generated IDs. I think this means I need to use a web browser emulator with a cookie jar? I have tried to use Snoopy, Goutte and a couple of other web browser emulators, but as of yet I have not been able to find tutorials on how to receive cookies. I am getting a little desperate! Can anyone give me an example of how to accept cookies in Snoopy or Goutte? Thanks in advance! Object-Oriented answer We implement as much as possible of the previous answer in one class called Browser

How can I scrape website content in PHP from a website that requires a cookie login?

不羁岁月 提交于 2019-11-26 09:02:57
问题 My problem is that it doesn\'t just require a basic cookie, but rather asks for a session cookie, and for randomly generated IDs. I think this means I need to use a web browser emulator with a cookie jar? I have tried to use Snoopy, Goutte and a couple of other web browser emulators, but as of yet I have not been able to find tutorials on how to receive cookies. I am getting a little desperate! Can anyone give me an example of how to accept cookies in Snoopy or Goutte? Thanks in advance! 回答1: