Reading data from text file with missing values

前端 未结 3 1194
萌比男神i
萌比男神i 2020-12-16 16:03

I want to read data from a file that has many missing values, as in this example:

1,2,3,4,5
6,,,7,8
,,9,10,11

I am using the numpy.loadtxt

相关标签:
3条回答
  • 2020-12-16 16:41

    I'd probably use genfromtxt:

    >>> from numpy import genfromtxt
    >>> genfromtxt("missing1.dat", delimiter=",")
    array([[  1.,   2.,   3.,   4.,   5.],
           [  6.,  nan,  nan,   7.,   8.],
           [ nan,  nan,   9.,  10.,  11.]])
    

    and then do whatever with the nans (change them to something, use a mask instead, etc.) Some of this could be done inline:

    >>> genfromtxt("missing1.dat", delimiter=",", filling_values=99)
    array([[  1.,   2.,   3.,   4.,   5.],
           [  6.,  99.,  99.,   7.,   8.],
           [ 99.,  99.,   9.,  10.,  11.]])
    
    0 讨论(0)
  • 2020-12-16 16:46

    Be careful that for this, according to my test, the caracter-cells are not detected, only the numerical values, so if you have a table with strings and numbers there should be some other way.

    My example:

    upeak_names.txt:
    id  name    Distance    name2   Distance2   name3   Distance3
    upeak-3 NOC2L   -161    KLHL17  -1135   NOC2L   -162
    
    >>>table= genfromtxt('upeak_names.txt', delimiter="\t")
    >>>comb_table[2,]
    >>>array([   nan,    nan,  -161.,    nan, -1135.,    nan,  -162.])
    
    0 讨论(0)
  • 2020-12-16 17:01

    This is because the function expects to return a numpy array with all cells of the same type.

    If you want a table with mixed strings and number, you should read it into a structured array instead, also you probably want to add skip_header=1 to skip the first line, ie in your case something like:

    np.genfromtxt('upeak_names.txt', delimiter="\t", dtype="S10,S10,f4,S10,f4,S10,f4", 
    names=["id", "name", "Distance", "name2", "Distance2", "name3", "Distance3], skip_header=1)
    

    See also:

    • Documentation for genfromtxt: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.genfromtxt.html

    • Documentation for structured arrays in numpy:
      https://docs.scipy.org/doc/numpy-1.15.0/user/basics.rec.html

    0 讨论(0)
提交回复
热议问题