How can I detect the encoding/codepage of a text file

后端 未结 20 1673
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-21 23:06

    If you're looking to detect non-UTF encodings (i.e. no BOM), you're basically down to heuristics and statistical analysis of the text. You might want to take a look at the Mozilla paper on universal charset detection (same link, with better formatting via Wayback Machine).

提交回复
热议问题