Reading unicode elements into numpy array

前端未结

关注

 2  608

逝去的感伤 2020-12-06 17:36

Consider a text file called \"new.txt\" containing the following elements:

μm
∂r
∆λ

In Python 2.7, I can read the file by typing:

2条回答

悲哀的现实 (楼主)

2020-12-06 18:03
In memory, unicode strings are represented as UCS-2 or UCS-4, depending on how your Python interpreter was compiled. Your file is encoded in UTF-8, so you need to recode it before you can map it to the NumPy array. loadtxt() can't do the recoding for you -- after all NumPy is mainly targeted at numerical arrays.

Assuming every line has the same number of characters, you could also use the more efficient variant
```
s = codecs.open("new.txt", encoding="utf-8").read()
arr = numpy.frombuffer(s, dtype="
```
This will include the newline characters in the strings. To not include them, use arr = numpy.frombuffer(s.replace("\n", ""), dtype=" Edit: If the lines of your file have different lengths and you would like to avoid the intermediate list, you can use arr = numpy.fromiter(codecs.open("new.txt", encoding="utf-8"), dtype=" I'm not sure if this will internally create some temporary list, though.
0 讨论(0) 查看其它2个回答发布评论: 提交评论加载中...