Convert dataframe to a rec array (and objects to strings)

后端未结

关注

 2  1069

I have a pandas dataframe with a mix of datatypes (dtypes) that I wish to convert to a numpy structured array (or record array, basically the same thing in this case). For

相关标签:

2条回答

心在旅途

2020-12-22 02:10
As far as I am aware, there is no native functionality for this. For example, the maximum length of all values within a series is not stored anywhere.

However, you can implement your logic more efficiently via a list comprehension and f-strings:
```
data_types = [(col, arr[col].dtype if arr[col].dtype != 'O' else \
               f'U{df[col].astype(str).str.len().max()}') for col in arr.dtype.names]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-12-22 02:19
Combining suggestions from @jpp (list comp for conciseness) & @hpaulj (cannibalize to_records for speed), I came up with the following, which is cleaner code and also about 5x faster than my original code (tested by expanding the sample dataframe above to 10,000 rows):
```
names = df.columns
arrays = [ df[col].get_values() for col in names ]

formats = [ array.dtype if array.dtype != 'O' 
            else f'{array.astype(str).dtype}' for array in arrays ] 

rec_array = np.rec.fromarrays( arrays, dtype={'names': names, 'formats': formats} )
```
The above will output unicode rather than strings which is probably better in general but in my case I need to convert to strings because I'm reading the binary file in fortran and strings seem to read in more easily. Hence, it may be better to replace the "formats" line above with this:
```
formats = [ array.dtype if array.dtype != 'O' 
            else array.astype(str).dtype.str.replace('<U','S') for array in arrays ]
```
E.g. a dtype of <U4 becomes S4.
0 讨论(0)
发布评论:

提交评论
- 加载中...