How do I load heterogeneous data (np.genfromtxt) as a 2D array?

耗尽温柔 提交于 2019-12-13 19:43:52

问题


I learn from numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why? that numpy.genfromtxt returns a structured ndarray if the data is not homogeneous. How do I load heterogeneous data as a 2D array?

For instance, a text file whose contents are: (all items except the header are int)

# c1    c2  c3  c4  c5
3   4   8   6   8
10  7   6   7   10
5   10  2   1   3
7   6   5   3   6
5   8   5   2   7
1   2   2   10  8
10  5   9   3   8
5   2   4   4   2

Load data using np.genfromtxt,

# load data from a text file
table = np.genfromtxt('table.dat', dtype=int, delimiter='\t', names=True, filling_values=0)
print(table.shape)
print(table)

# output
(8,)
[(3, 4, 8, 6, 8) (10, 7, 6, 7, 10) (5, 10, 2, 1, 3) (7, 6, 5, 3, 6)
 (5, 8, 5, 2, 7) (1, 2, 2, 10, 8) (10, 5, 9, 3, 8) (5, 2, 4, 4, 2)]

# expecting result
(8, 5)
[[ 7  2  4  9  2]
 [ 5  8  1  6  4]
 [ 6  3  1  4 10]
 [10 10  6  5  5]
 [10  4  7  7  1]
 [ 1  9  8  6  2]
 [ 3  2  3  4  4]
 [ 7  5  9 10  6]]

PS: I wanna keep header = table.dtype.names for other purpose.


回答1:


In this case use pandas and then converting pandas dataframe to numpy matrix would be easier.

import pandas as pd
foo = pd.read_csv('table.dat', sep='\t')
type(foo)
<class 'pandas.core.frame.DataFrame'>
bar = foo.as_matrix()
array([[10,  7,  6,  7, 10],
       [ 5, 10,  2,  1,  3],
       [ 7,  6,  5,  3,  6],
       [ 5,  8,  5,  2,  7],
       [ 1,  2,  2, 10,  8],
       [10,  5,  9,  3,  8],
       [ 5,  2,  4,  4,  2]])
bar.shape
(7,5)



回答2:


I got this to work with:

import numpy as np

table = np.genfromtxt('table.dat',
                      dtype=None,
                      skip_header=1)

Here's why it works:

  • You should consecutive whitespace as the delimiter (the default) not tabs (unless the snippet you posted has lost formatting).
  • You should let NumPy infer the dtype, rather than using the default float.
  • To get the desired output in your question you want to simply skip the header column rather than get the function to create a structured dtype.

Check out the docs: http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.genfromtxt.html for more details.

I agree a Pandas DataFrame may be more appropriate if you are essentially reading in a csv file.




回答3:


Your data looks homogeneous - all int except for the header. But by saying header=True you force it to load it as a structured array. Look at the dtype.

Try skip_header=1 (check the syntax). Omit names (or make it false).

In other words you want to load integers, ignoring the header line.

The tab delimiter appears to be working ok.

I see from a comment that you have discovered the view method of converting a structured array. That gives you both header names and a 2d view.



来源:https://stackoverflow.com/questions/36485619/how-do-i-load-heterogeneous-data-np-genfromtxt-as-a-2d-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!