Load text file as strings using numpy.loadtxt()

前端未结

关注

 3  862

I would like to load a big text file (around 1 GB with 3*10^6 rows and 10 - 100 columns) as a 2D np-array containing strings. However, it seems like numpy.loadtxt() onl

相关标签:

3条回答

鱼传尺愫

2020-12-13 20:08
There is also read_csv in Pandas, which is fast and supports non-comma column separators and automatic typing by column:
```
import pandas as pd
df = pd.read_csv('your_file',sep='\t')
```
It can be converted to a NumPy array if you prefer that type with:
```
import numpy as np
arr = np.array(df)
```
This is by far the easiest and most mature text import approach I've come across.
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-12-13 20:29
Is it essential that you need a NumPy array? Otherwise you could speed things up by loading the data as a nested list.
```
def load(fname):
    ''' Load the file using std open'''
    f = open(fname,'r')

    data = []
    for line in f.readlines():
        data.append(line.replace('\n','').split(' '))

    f.close()

    return data
```
For a text file with 4000x4000 words this is about 10 times faster than loadtxt.
0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-13 20:31
Use genfromtxt instead. It's a much more general method than loadtxt:
```
import numpy as np
print np.genfromtxt('col.txt',dtype='str')
```
Using the file col.txt:
```
foo bar
cat dog
man wine
```
This gives:
```
[['foo' 'bar']
 ['cat' 'dog']
 ['man' 'wine']]
```
If you expect that each row has the same number of columns, read the first row and set the attribute filling_values to fix any missing rows.
0 讨论(0)
发布评论:

提交评论
- 加载中...