What is the default dtype for str like input in numpy?

隐身守侯 提交于 2019-12-22 09:14:07

问题


I just wanted to confirm if the default data type for string is unicode while creating a ndarray. I could not find any reference which states this clearly. May be it is too obvious and doesn't need stating.

When dtype is specified:

>>> import numpy as np
>>> g = np.array([['a', 'b'],['c', 'd']], dtype='S')
>>> g
array([[b'a', b'b'],
       [b'c', b'd']], 
      dtype='|S1')

Without specifying the dtype:

>>> g = np.array([['a', 'b'],['c', 'd']])
>>> g
array([['a', 'b'],
       ['c', 'd']], 
      dtype='<U1')

Also, what does the literal b indicate when dtype is specified. As per the documentation, it indicates bool which doesn't seem to be the case here.

Can some one please clarify?


回答1:


b'...' means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. Unicodes (python 3 strings are unicode) are U and Python 2 str or Python 3 bytes have the dtype S. You can find the explanation of dtypes in the NumPy documentation here

Array-protocol type strings

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are:

  • '?' boolean
  • 'b' (signed) byte
  • 'B' unsigned byte
  • 'i' (signed) integer
  • 'u' unsigned integer
  • 'f' floating-point
  • 'c' complex-floating point
  • 'm' timedelta
  • 'M' datetime
  • 'O' (Python) objects
  • 'S', 'a' zero-terminated bytes (not recommended)
  • 'U' Unicode string
  • 'V' raw data (void)

However in your first case you actually forced NumPy to convert it to bytes because you specified dtype='S'.



来源:https://stackoverflow.com/questions/46051977/what-is-the-default-dtype-for-str-like-input-in-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!