How to get web-page-title with CURL in PHP from web-sites of different CHARSET?

孤街浪徒 提交于 2019-12-11 06:28:48

问题


I want to store the title in UTF—8,but the pages comes up with many different charset,such as GBK,ISO,unicode……

Could you give me some help?

Thanks.


回答1:


Identify or detect the character encoding and convert the data to UTF-8 if necessary.

For HTML (i.e. text/html) there are three ways to specify the character encoding:

  1. An HTTP "charset" parameter in a "Content-Type" field.
  2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
  3. The charset attribute set on an element that designates an external resource.

If neither of these is present, you might do some content sniffing or switch to some default character encoding (e.g. ISO 8859-1).

If the identified/detected character encoding is not UTF-8, you then can convert the data to UTF-8 with iconv or mb_convert_encoding.



来源:https://stackoverflow.com/questions/4426852/how-to-get-web-page-title-with-curl-in-php-from-web-sites-of-different-charset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!