Reading UTF-8 text files with ReadList

℡╲_俬逩灬. 提交于 2019-12-01 05:29:40

If I leave out Word, this works:

$CharacterEncoding = "UTF-8";

ReadList["UTF8.txt"]

This however is a failure, because the data is not read as strings.

Please try this on a larger file and report its performance:

FromCharacterCode[BinaryReadList["UTF8.txt"], "UTF-8"]

This seems to work

FromCharacterCode[ToCharacterCode[ReadList["raw.php.txt", Word]], "UTF-8"]

The timings I get for the linked test file are

FromCharacterCode[ToCharacterCode[ReadList["test.txt", Word]], "UTF-8"]); // Timing

(* ==> {0.000195, Null} *)

Import["test.txt", "Text"]; // Timing

(* ==> {0.01784, Null} *)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!