问题
I'm trying to scrape a website that takes in POST data to return the correct page (sans POST it returns 15 results, with POST data it returns all results).
Currently my code is looking like this:
$curl = curl_init();
curl_setopt($curl,CURLOPT_URL,"http://www.thisismyurl.com/awesome");
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, XXXXXX);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result= curl_exec($curl);
I know that I need to put my postfields into the space filled with "XXXXXX", I just don't know where to dig up the post fields/values and how to structure them into the variable that I pass into there.
Any help would be greatly appreciated!
回答1:
If it's a simple form, then just extract all the form fields and duplicate them in your script. If it's some dynamic form, like javascript building up a request and using ajax, then you can sniff the data using developer tools (e.g. Firefox's Firebug Net tab, HTTPfox, etc...) and extract the post data as it gets sent over.
Either way, once you know what fields/data are being sent, the rest should be (relatively) easy to duplicate/build.
回答2:
I think someone may look for code to replace XXXXXX. I use the following piece of code.
$ch = curl_init();
$timeout=5;
$name=$_REQUEST['name'];
$pass=$_REQUEST['pass'];
$data = array('username' => '$name', 'password' => '$pass');
$data=http_build_query($data);
curl_setopt($ch,CURLOPT_URL,"superawsomesite.com");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);
来源:https://stackoverflow.com/questions/8775161/how-can-scrape-website-via-php-that-requires-post-data