How can scrape website via PHP that requires POST data?

…衆ロ難τιáo~ 提交于 2019-12-01 09:27:36

问题


I'm trying to scrape a website that takes in POST data to return the correct page (sans POST it returns 15 results, with POST data it returns all results).

Currently my code is looking like this:

$curl = curl_init();
curl_setopt($curl,CURLOPT_URL,"http://www.thisismyurl.com/awesome");
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, XXXXXX);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result= curl_exec($curl);

I know that I need to put my postfields into the space filled with "XXXXXX", I just don't know where to dig up the post fields/values and how to structure them into the variable that I pass into there.

Any help would be greatly appreciated!


回答1:


If it's a simple form, then just extract all the form fields and duplicate them in your script. If it's some dynamic form, like javascript building up a request and using ajax, then you can sniff the data using developer tools (e.g. Firefox's Firebug Net tab, HTTPfox, etc...) and extract the post data as it gets sent over.

Either way, once you know what fields/data are being sent, the rest should be (relatively) easy to duplicate/build.




回答2:


I think someone may look for code to replace XXXXXX. I use the following piece of code.

$ch = curl_init();
$timeout=5;
$name=$_REQUEST['name'];
$pass=$_REQUEST['pass'];
$data = array('username' => '$name', 'password' => '$pass');
$data=http_build_query($data);
curl_setopt($ch,CURLOPT_URL,"superawsomesite.com"); 
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$data = curl_exec($ch);
curl_close($ch);


来源:https://stackoverflow.com/questions/8775161/how-can-scrape-website-via-php-that-requires-post-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!