PHP: Do I need to use cookies in this cURL script?

南笙酒味 提交于 2019-12-25 01:55:17

问题


The following script:

<?php
$sDataFile = '<path>\journal-issue-ToC.htm';
$sURL = 'https://onlinelibrary.wiley.com/toc/14678624/2014/85/1';
$bHeader = false;
$sCAinfo = '<path>\cacert.pem';

$cURLhandle = curl_init();
$FilePointer = fopen($sDataFile, 'wb');

curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_setopt($cURLhandle, CURLOPT_FILE, $FilePointer);
curl_setopt($cURLhandle, CURLOPT_HEADER, $bHeader);
curl_setopt($cURLhandle, CURLOPT_CAINFO, $sCAinfo);

curl_exec($cURLhandle);

curl_close($cURLhandle);
fclose($FilePointer);

saves the file "journal-issue-ToC.htm" containing only the following one line:

The URL has moved <a href="https://onlinelibrary.wiley.com/toc/14678624/2014/85/1?cookieSet=1">here</a>

If I open this file in a browser, it says "The URL has moved here", with the word "here" linked to the desired URL suffixed with "?cookieSet=1". If I click on that link, it takes me to the page I am attempting to save with cURL.

I thought that maybe I could simulate clicking on that link by suffixing the URL with "?cookieSet=1" and calling cURL_exec() a second time. So I added three lines to the script to do that:

<?php
$sDataFile = '<path>\journal-issue-ToC-2.htm';
$sURL = 'https://onlinelibrary.wiley.com/toc/14678624/2014/85/1';
$bHeader = false;
$sCAinfo = '<path>\cacert.pem';

$cURLhandle = curl_init();
$FilePointer = fopen($sDataFile, 'wb');

curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_setopt($cURLhandle, CURLOPT_FILE, $FilePointer);
curl_setopt($cURLhandle, CURLOPT_HEADER, $bHeader);
curl_setopt($cURLhandle, CURLOPT_CAINFO, $sCAinfo);

curl_exec($cURLhandle);

$sURL .= '?cookieSet=1';
curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_exec($cURLhandle);

curl_close($cURLhandle);
fclose($FilePointer);

This script saves the file "journal-issue-ToC-2.htm" containing only the following two lines:

The URL has moved <a href="https://onlinelibrary.wiley.com/toc/14678624/2014/85/1?cookieSet=1">here</a>
The URL has moved <a href="http://onlinelibrary.wiley.com/action/cookieAbsent">here</a>

If I open this file in a browser, it says "The URL has moved here" twice, with the first word "here" linked to the desired URL suffixed as before and the second word "here" linked to the useless page "http://onlinelibrary.wiley.com/action/cookieAbsent".

I Googled php curl "The URL has moved here". Most of the results were in foreign languages and none gave any hint of the cause of this behavior or how to get past it to actually retrieving the desired page.

I wonder if the problem is that I need to do something with cookies in curl_setopt(). I haven't worked with cookies before and I've been reading about the options for them in curl_setopt() and feel a bit lost. Can someone explain what's going on in these scripts and what I need to change to get the scripts to work?

I'm running PHP 7.2.2 on IIS 7.5 under Windows 7 64 bit.


回答1:


Do I need to use cookies in this cURL script?

Yes


You have to setup curl to store/update cookies received by the website and send them back upon each request.

Furthermore as the site will serve content only when cookies are sent back you have to issue two requests. The first one will just let the cookies be get and stored. The second one (that will send back the cookies stored) will get the actual content.

In order to store cookies received and send them upon each request you need these lines:

curl_setopt($cURLhandle, CURLOPT_COOKIEFILE, "path_to\cookies.txt");
curl_setopt($cURLhandle, CURLOPT_COOKIEJAR,  "path_to\cookies.txt");

path_to\cookies.txt is the absolute path to the file that stores the cookies locally. The file is created upon the first call. Of course the target directory must be readable/writeable.

Finally do two curl calls:

1) just load the home page https://onlinelibrary.wiley.com/

2) load the desired page https://onlinelibrary.wiley.com/toc/14678624/2014/85/1


Note that if you're going to fetch several pages you need step 1 only the first time.



来源:https://stackoverflow.com/questions/55584173/php-do-i-need-to-use-cookies-in-this-curl-script

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!