Load text file as strings using numpy.loadtxt()

前端 未结 3 862
深忆病人
深忆病人 2020-12-13 19:55

I would like to load a big text file (around 1 GB with 3*10^6 rows and 10 - 100 columns) as a 2D np-array containing strings. However, it seems like numpy.loadtxt() onl

相关标签:
3条回答
  • 2020-12-13 20:08

    There is also read_csv in Pandas, which is fast and supports non-comma column separators and automatic typing by column:

    import pandas as pd
    df = pd.read_csv('your_file',sep='\t')
    

    It can be converted to a NumPy array if you prefer that type with:

    import numpy as np
    arr = np.array(df)
    

    This is by far the easiest and most mature text import approach I've come across.

    0 讨论(0)
  • 2020-12-13 20:29

    Is it essential that you need a NumPy array? Otherwise you could speed things up by loading the data as a nested list.

    def load(fname):
        ''' Load the file using std open'''
        f = open(fname,'r')
    
        data = []
        for line in f.readlines():
            data.append(line.replace('\n','').split(' '))
    
        f.close()
    
        return data
    

    For a text file with 4000x4000 words this is about 10 times faster than loadtxt.

    0 讨论(0)
  • 2020-12-13 20:31

    Use genfromtxt instead. It's a much more general method than loadtxt:

    import numpy as np
    print np.genfromtxt('col.txt',dtype='str')
    

    Using the file col.txt:

    foo bar
    cat dog
    man wine
    

    This gives:

    [['foo' 'bar']
     ['cat' 'dog']
     ['man' 'wine']]
    

    If you expect that each row has the same number of columns, read the first row and set the attribute filling_values to fix any missing rows.

    0 讨论(0)
提交回复
热议问题