'b' character added when using numpy loadtxt

问题

I tried to create an array from a text file. I saw earlier that numpy had a method loadtxt, so I try it, but it add some junk character before each row...

# my txt file

    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

# my python v3.4 program

import numpy as np
f = open('tile', 'r')
a = np.loadtxt(f, dtype=str, delimiter='\n')
print(a)

# my print output

["b'    .--``--.    '"
 "b'.--`        `--.'"
 "b'|              |'"
 "b'|              |'"
 "b'`--.        .--`'"
 "b'    `--..--`    '"]

What are these 'b' and double quotes ? And where do they come from ? I tried some solution picked from internet, like open the file with codecs, change the dtype by 'S20', 'S11', and a lot of other things which don't work... What I expect is an array of unicode strings which look like this :

[['    .--``--.    ']
 ['.--`        `--.']
 ['|              |']
 ['|              |']
 ['`--.        .--`']
 ['    `--..--`    ']]

Info: I'm using python 3.4 and numpy from the debian stable repository

回答1:

np.loadtxt and np.genfromtxt operate in byte mode, which is the default string type in Python 2. But Python 3 uses unicode, and marks bytestrings with this b.

I tried some variations, in an python3 ipython session:

In [508]: np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[508]: b'    .--``--.'
In [509]: np.loadtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[509]: "b'    .--``--.'"
...
In [511]: np.genfromtxt('stack33655641.txt',dtype=str,delimiter='\n')[0]
Out[511]: '.--``--.'
In [512]: np.genfromtxt('stack33655641.txt',dtype=None,delimiter='\n')[0]
Out[512]: b'.--``--.'
In [513]: np.genfromtxt('stack33655641.txt',dtype=bytes,delimiter='\n')[0]
Out[513]: b'.--``--.'

genfromtxt with dtype=str gives the cleanest display - except it strips blanks. I may have to use a converter to turn that off. These functions are meant to read csv data where (white)spaces are separators, not part of the data.

loadtxt and genfromtxt are over kill for simple text like this. A plain file read does nicely:

In [527]: with open('stack33655641.txt') as f:a=f.read()
In [528]: print(a)
    .--``--.
.--`        `--.
|              |
|              |
`--.        .--`
    `--..--`

In [530]: a=a.splitlines()
In [531]: a
Out[531]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

(my text editor is set to strip trailing blanks, hence the ragged lines).

@DSM's suggestion:

In [556]: a=np.loadtxt('stack33655641.txt',dtype=bytes,delimiter='\n').astype(str)
In [557]: a
Out[557]: 
array(['    .--``--.', '.--`        `--.', '|              |',
       '|              |', '`--.        .--`', '    `--..--`'], 
      dtype='<U16')
In [558]: a.tolist()
Out[558]: 
['    .--``--.',
 '.--`        `--.',
 '|              |',
 '|              |',
 '`--.        .--`',
 '    `--..--`']

回答2:

You can use np.genfromtxt('your-file', dtype='U').

回答3:

Python3 is working with Unicode. I had the same issue when using loadtxt with dtype='S'. But using dtype='U as Unicode string in both numpy.loadtxt or numpy.genfromtxt, it will give output without b

a=numpy.loadtxt('filename',dtype={'names':('col1','col2','col3'),'formats':('U10','U10','i4')},delimiter=',')

print(a)

回答4:

This is probably not the most 'pythonic' or best solution, but definitely gets the job done using numpy.loadtxt in python3. I am aware that it is a "dirty" solution, but it works for me.

import numpy as np
def loadstr(filename):
    dat = np.loadtxt(filename, dtype=str)
    for i in range(0,np.size(dat[:,0])):
        for j in range(0,np.size(dat[0,:])):
            mystring = dat[i,j]
            tick = len(mystring) - 1 
            dat[i,j] = mystring[2:tick]

    return (dat)

data = loadstr("somefile.txt")

This will import a 2D array from a text file via numpy, strip off the "b'" and "'" from the beginning and end of each string, and return a stripped string array named "data".

Are there better ways? Probably.

Does this work? Yup. I use it enough that I've got this function in my own Python module.

回答5:

I had the same issue and for me the simplest way turned out to use the csv library. You get your desired output by:

import csv
def loadFromCsv(filename):
    with open(filename,'r') as file:
        list=[elem for elem in csv.reader(file,delimiter='\n')]
    return list

a=loadFromCsv('tile')
print(a)

来源：https://stackoverflow.com/questions/33655641/b-character-added-when-using-numpy-loadtxt

标签

python

numpy

python-3.4