CURL import character encoding problem

前端 未结 5 1483
暖寄归人
暖寄归人 2020-12-03 19:43

I\'m using CURL to import some code. However, in french, all the characters come out funny. For example: Bonjour ...

I don\'t have access to change anything on the

相关标签:
5条回答
  • 2020-12-03 19:51

    Like Jon Skeet pointed it's difficult to understand your situation, however if you have access only to final text, you can try to use iconv for changing text encoding.

    I.e.

    $text = iconv("Windows-1252","UTF-8",$text);
    

    I've had similar issue time ago (with Italian language and special chars) and I've solved it in this way.

    Try different combination (UTF-8, ISO-8859-1, Windows-1252).

    0 讨论(0)
  • 2020-12-03 19:56

    You could replace your

    $data = curl_exec($ch);
    

    by

    $data = utf8_decode(curl_exec($ch));
    

    I had this same issue and it worked well for me.

    0 讨论(0)
  • 2020-12-03 19:58

    I'm currently suffering a similar problem, i'm trying to write a simple html <title> importer cia cURL. So i'm going to give an idea of what i've done until now:

    1. Retrieve the HTML via cURL
    2. Check if there's any hint of encoding on the response headers via curl_getinfo() and match it via regex
    3. Parse the HTML for the purpose of looking at the content-type meta and the <title> tag (yes, i know the consequences)
    4. Compare both content-type, header and meta and choose the meta one if it's different, because we know noone cares about their httpd configuration and there are a lot of dirt workarounds using it
    5. iconv() the string
    6. Whish everyday that when someone does not follow the standards $DEITY punishes him/her until the end of the days, because it would save me the meta parsing
    0 讨论(0)
  • 2020-12-03 20:07

    PHP seems to use UTF-8 by default, so I found the following works

    $text = iconv("UTF-8","Windows-1252",$text);

    0 讨论(0)
  • 2020-12-03 20:10

    I had a similar problem. I tried to loop through all combinations of input and output charsets. Nothing helped! :(

    However I was able to access the code that actually fetched the data and this is where the culprit lied. Data was fetched via cURL. Adding

     curl_setopt($ch,CURLOPT_BINARYTRANSFER,true);
    

    fixed it.

    A handy set of code to try out all possible combinations of a list of charsets:

    $charsets = array(  
            "UTF-8", 
            "ASCII", 
            "Windows-1252", 
            "ISO-8859-15", 
            "ISO-8859-1", 
            "ISO-8859-6", 
            "CP1256"
            ); 
    
    foreach ($charsets as $ch1) { 
        foreach ($charsets as $ch2){ 
            echo "<h1>Combination $ch1 to $ch2 produces: </h1>".iconv($ch1, $ch2, $text_2_convert); 
        } 
    } 
    
    0 讨论(0)
提交回复
热议问题