问题
I have been using file_get_contents to grab the contents of a site for years.
Recently, they updated their URL to HTTPS and file_get_contents stopped working.
I've read previous questions and tried marked solutions, but nothing has worked.
For example, I tried this, and it returned the following:
openssl: yes http wrapper: yes https wrapper: yes wrappers: array ( 0 => 'https', 1 => 'ftps', 2 => 'compress.zlib', 3 => 'compress.bzip2', 4 => 'php', 5 => 'file', 6 => 'data', 7 => 'http', 8 => 'ftp', 9 => 'zip', )
So then I tried this solution with file_get_contents, to no avail.
I then tried this solution with cURL to ignore encryption altogether, to no avail
No matter which solution I try, nothing is returned.
I have not added extension=php_openssl.dll and allow_url_include = On to PHP.ini as per this as this particular site is on a shared host and the hosting company does not allow the PHP.ini filed to be edited, although they may already be enabled by default.
I tried other HTTPS sites, and some work and some do not, and I'm not sure why.
I tried from a different Server (and different IP) on the same web host, and it also did not work with the target HTTPS site.
How can I debug and fix this?
UPDATE:
phpinfo shows:
curl
cURL support enabled
cURL Information libcurl/7.36.0 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 libssh2/1.8.0
openssl
OpenSSL support enabled
OpenSSL Version OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
回答1:
FINAL ANSWER
If your ISP will not upgrade openSSL to TLS 1.2 you should seriously consider another ISP. You should test your server with the "SSL SERVER TEST" link below. Your server likely has SSL security vulnerabilities.
The server you are trying to connect with only supports TLS 1.2 and TLS 1.1
Does not support :TLS 1.0, SSL 3, SSL2.
When an SSL request is made, as part of the SSL protocol, curl presents a list of ciphers to the host server. The server then picks which cypher protocol to use based on the list presented by curl.
The host you are trying to cont to supports these cypher suites
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (0x9f)
TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 (0x9e)
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (0xc028)
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)
TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 (0x6b)
TLS_DHE_RSA_WITH_AES_256_CBC_SHA (0x39)
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (0xc027)
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)
TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 (0x67)
TLS_DHE_RSA_WITH_AES_128_CBC_SHA (0x33)
TLS_RSA_WITH_AES_256_GCM_SHA384 (0x9d)
TLS_RSA_WITH_AES_128_GCM_SHA256 (0x9c)
TLS_RSA_WITH_AES_256_CBC_SHA256 (0x3d)
TLS_RSA_WITH_AES_256_CBC_SHA (0x35)
TLS_RSA_WITH_AES_128_CBC_SHA256 (0x3c)
TLS_RSA_WITH_AES_128_CBC_SHA (0x2f)
Because your openSSL was released in July 2008 and TLSv1.2 was released the following month, August 2008, the best you have is TLSv1.1
POSSIBLE TEMPORARY FIX until you upgrade
I do not have a high level of confidence this will work for you
You should test your own server's SSL with something like this SSL SERVER TEST
If your server supports TLS1.1 then you can try the following. I cannot test this because I do not have the same version of curl as you on the old server with your version of openSSL.
Use the curl option, CURLOPT_SSL_CIPHER_LIST to restrain the host server from using anything other than TLS 1.1
curl_setopt($ch, CURLOPT_SSL_CIPHER_LIST, 'TLSv1');
curl_setopt($ch, CURL_SSLVERSION_TLSv1_1);
If not then try:
curl_setopt($ch, CURLOPT_SSL_CIPHER_LIST, 'DEFAULT');
curl_setopt($ch, CURL_SSLVERSION_TLSv1_1);
BOTTOM LINE
For more reasons than this issue, you need to upgrade your openSSL.
-------------------------------------------------------------------------
-
PREVIOUS TROUBLESHOOTING BELOW THIS POINT
The first thing I do is turn off javascript in the Browser. If I can retrieve the page with a browser without javascript, I KNOW I can get it with PHP.
I build the request to look exactly like it does in the Browser. I go to the Network tab of the Inspector and Edit the Request Header and copy it an paste it into my code.
$request = array();
$request[] = 'Host: example.com';
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36';
$request[] = 'DNT: 1';
$request[] = 'Origin: https://example.com';
$request[] = 'Referer: https://example.com/entry/login';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';
Initalize curl
$url = 'https://example.com/entry/login';
$ch = curl_init($url);
Add the request parameters
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
Tell curl to include the headers
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
Return the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
Follow redirects Redirects may be a trap. You may have to NOT follow and analyze the response. Often the redirects are there to set cookies.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIESESSION , true );
Let curl handle compression
curl_setopt($ch, CURLOPT_ENCODING,"");
Set timeout parameters
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
Make the Request and get Response
The following will get everything you need to know about the requests. The $info will also have all the redirect headers too. If redirects were made the $responseHeader will have all the response headers.
UPDATE: New Fully Tested Code
This may not matter because this also works on my machine:
echo file_get_contents($url);
If curl fails, this code should give you a reason WHY it failed.
Change the url. This one belongs to a client.
<?php
header('content-type: text/plain');
$url = 'https://amxemr.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
if (curl_errno($ch)){
echo 'Retreive Base Page Error: ' . curl_error($ch);
}
else {
$info = rawurldecode(var_export(curl_getinfo($ch),true));
// Get the cookies:
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$responseHeader= substr($data,0,$skip);
$data= substr($data,$skip);
echo "HEADER: $responseHeader\n";
echo "\n\nINFO: $info\n\nDATA: $data";
}
?>
If the above did not work run phpinfo()
<?php
phpinfo();
?>
There should be a curl section and openSSL.
--------------------------------------------------------------------
UPDATE TWO
Good News
I know the problem and I was able to replicate the errors you got.
Retreive Base Page Error:
Unknown SSL protocol error in connection to www.xxxx.com:443
NOTE xxx was the site from the link you gave me, you can delete that message now.
Funny thing, I have one server I do not update. And by luck, it had the same version of openSSL from July 2008.
You need to upgrade your openSSL. Also the file_get_contents() failed on this server too. It worked on a Feb. 2013 version of openSSL as well as the June 2014.
I cannot say whether or not anything else needs to be upgraded like the functions that use openSSL may (or may not) need to be upgraded.
I go with the adage if it ain't broke don't fix it. I do believe some upgrades are actually down grades. I'm still on XP. But it's broke and you need to fix it.
At least it's not a shot in the dark fix. I am confident you have to upgrade. It was a methodical troubleshooting procedure that was able to duplicate your errors. You can go back to using file_get_contents() too.
回答2:
use curl with curl you can easily bring in any page over https.
note this lines:
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
here is working code, tested for twitter and facebook
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
//ini_set('display_errors',1);
//$crawled = [];
set_time_limit(0);// to infinity for example
ob_start();
$output;
function grabAll($url){
$ch = curl_init();
// 2. set the options, including the url
curl_setopt($ch, CURLOPT_URL,$url);
// curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($ch, CURLOPT_HEADER, 0);
//curl_setopt ($ch, CURLOPT_CAINFO, "ca-cert/cacert.pem");
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
//curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, '1L');
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//curl_setopt($ch, CURLOPT_TIMEOUT, 400);
//curl_setopt ($ch, CURLOPT_POST, 1);
// 3. execute and fetch the resulting HTML output
//curl_exec($ch);
$output = curl_exec($ch);
ob_flush();//Flush the data here
if ($output === FALSE) {
echo "cURL Error: " . curl_error($ch);
}
$info = curl_getinfo($ch);
//echo 'Took ' . $info['total_time'] . ' seconds for url ' . $info['url'];
// 4. free up the curl handle
curl_close($ch);
//print_r($crawled);
//return $output ;
echo $output;
}
grabAll('https://twitter.com/?lang=en');
UPDATE 1: use this code to save the file
function grab_image($url,$saveto){
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSLVERSION, 4);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$raw=curl_exec($ch);
curl_close ($ch);
if(file_exists($saveto)){
unlink($saveto);
}
$fp = fopen($saveto,'x');
fwrite($fp, $raw);
fclose($fp);
}
grab_image('i.imgur.com/85wsoLI.jpg','download/');
hope this solved your problem!!
here is demo on my server: http://54.167.121.86/curl/curl.php
回答3:
if by nothing, you mean an empty response body, it doesn't sound like an httpS issue. if it was, then curl_exec would complain, curl_exec() would return bool(false) , and curl_error() would indicate an SSL problem.
How can I debug and fix this?
investigate the request sent by your browser when you get a valid response (use your browser's developer tools for this. for example, the "Network" tab in Google Chrome's Ctrl+shift+i ), then compare it with the request sent by curl when you get an invalid response (use CURLOPT_VERBOSE for this), and 1 by 1, add all the headers the browser send,
for example, you'll notice that libcurl sends no user-agent header, while your browser sends something like user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 , so add that header.
you'll also notice that libcurl by default sends Accept: */* , while your browser sends Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 - so fix that, make curl send the same headers.
keep doing this, until the 2 requests are indistinguishable, and along the way, you'll find the difference that makes curl blocked.
my bet is on the user-agent header.
回答4:
Sometimes it helps to not validate the certificate and host, but simply trust the cryptographic in SSL.
$context = stream_context_create(
array('http' => array(
'follow_location' => true
),
'ssl' => array(
'verify_peer' => false,
'verify_peer_name' => false
)
)
);
$content = @file_get_contents($file, FALSE, $context);
回答5:
Does the HTTPS site have a self-signed certificate? Can you provide the domain names for some of the sites that works and some that doesn't?
Have you tried using "allow_self_signed" => true in the stream context configuration?
So it gets like:
$arrContextOptions=array(
"ssl"=>array(
"verify_peer"=>false,
"verify_peer_name"=>false,
"allow_self_signed"=>true,
),
);
$response = file_get_contents($url, false, stream_context_create($arrContextOptions));
回答6:
As it looks like a problem with SSL version you could set CURL to ignore it using CURLOPT_SSL_VERIFYPEER.
Here is a script working with the url you posted
$url = 'https://XXX/YYY/view-all';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
print_r($response);
来源:https://stackoverflow.com/questions/42175050/unable-to-file-get-contents-or-curl-via-https