问题
I'm not getting file_get_contents() to return the page in this particular case where the url contains an 'Ö' character.
$url = "https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1&type=subgroup&startdate=20150101&enddate=20300501"
print file_get_contents($url);
How do I make file_get_contents() work as expected on this url?
I have tried following solutions whithout a working result:
1.
print rawurlencode(utf8_encode($url));
2.
print mb_convert_encoding($url, 'HTML-ENTITIES', "UTF-8");
3.
$url = urlencode($url);
print file_get_contents($url);
4.
$content = file_get_contents($url);
print mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
Found in these questions:
file_get_contents - special characters in URL
PHP get url with special characters without urlencode:ing them!
file_get_contents() Breaks Up UTF-8 Characters
UPDATE: As you can see a page is actually returned in my example but it is not the expected page, the one you get when you type the url in the browser.
回答1:
URLs cannot contain "Ö"! Start from this basic premise. Any characters not within a narrowly defined subset of ASCII must be URL-encoded to be represented within a URL. The right way to do that is to urlencode or rawurlencode (depending on which format the server expects) the individual segment of the URL, not the URL as a whole.
E.g.:
$url = sprintf('https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=%s&type=subgroup&startdate=20150101&enddate=20300501',
rawurlencode('CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1'));
You will still need to use the correct encoding for the string! Ö in ISO-8859-1 would be URL encoded to %D6, while in UTF-8 it would be encoded to %C3%96. Which one is the correct one depends on what the server expects.
回答2:
One needs to percentage encode the unicode characters. This is one way that I know of doing it.
$url2 = "https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=" . urlencode('CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1') . "&type=subgroup&startdate=20150101&enddate=20300501";
echo "encoded: " . $url2;
print file_get_contents($url2);
来源:https://stackoverflow.com/questions/31720418/file-get-contents-special-characters-in-url-special-case