file_get_contents() fails with special characters in URL

邮差的信 提交于 2019-12-02 05:09:53

问题


I have a need to fetch some URL's which have some characters from the Swedish alphabet.

If you take an example of such string as https://en.wikipedia.org/wiki/Åland_Islands, passing that straight into the file_get_contents call as a parameter works just fine. But if you run that URL through urlencode first, then the call fails with the message:

failed to open stream: No such file or directory

despite the documentation for file_get_contents saying:

Note: If you're opening a URI with special characters, such as spaces, you need to encode the URI with urlencode().

So for example, if you run the following code:

error_reporting(E_ALL);
ini_set("display_errors", true);

$url = urlencode("https://en.wikipedia.org/wiki/Åland_Islands");

$response = file_get_contents($url);
if($response === false) {
    die('file get contents has failed');
}
echo $response;

You will get the error. If you just remove the "urlencode" from the code, it will run just fine.

The problem I am facing is that there is a parameter in my URL that is taken from a submitted form. And since PHP always runs submitted values through the urlencode, the Swedish characters in my constructed URL will cause the error to happen.

How do I get around this?


回答1:


The problem is likely due to urlencode escaping your protocol:

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands

This is a problem I have also faced, and could only fix by trying to target the escaping to only what is necessary for escape:

https://en.wikipedia.org/wiki/Åland_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands    

This is as can be imagined tricky depending on where your characters are located. I usually opt for an encode patch solution, but some people I have worked with prefer to only encode the dynamic segment of their urls.

Here is my approach:

https://en.wikipedia.org/wiki/Åland_Islands
https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F%C3%85land_Islands
https://en.wikipedia.org/wiki/%C3%85land_Islands

Code:

$url = 'https://en.wikipedia.org/wiki/Åland_Islands';
$encodedUrl = urlencode($url);
$fixedEncodedUrl = str_replace(['%2F', '%3A'], ['/', ':'], $encodedUrl);

Hope it helps.




回答2:


use this

$usableURL = mb_convert_encoding($url,'HTML-ENTITIES');


来源:https://stackoverflow.com/questions/31097744/file-get-contents-fails-with-special-characters-in-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!