I am interested in knowing how to convert a pandas dataframe into a NumPy array.
dataframe:
import numpy as np
import pandas as pd
index = [1, 2, 3,
You can use the to_records method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object dtype in pandas):
In [102]: df
Out[102]:
label A B C
ID
1 NaN 0.2 NaN
2 NaN NaN 0.5
3 NaN 0.2 0.5
4 0.1 0.2 NaN
5 0.1 0.2 0.5
6 0.1 NaN 0.5
7 0.1 NaN NaN
In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
(4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
(7, 0.1, nan, nan)],
dtype=[('index', '|O8'), ('A', '
Converting the recarray dtype does not work for me, but one can do this in Pandas already:
In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '
Note that Pandas does not set the name of the index properly (to ID) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.
At the moment Pandas has only 8-byte integers, i8, and floats, f8 (see this issue).