可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm running the following python script:
#!/usr/bin/python import os,sys from scipy import stats import numpy as np f=open('data2.txt', 'r').readlines() N=len(f)-1 for i in range(0,N): w=f[i].split() l1=w[1:8] l2=w[8:15] list1=[float(x) for x in l1] list2=[float(x) for x in l2] result=stats.ttest_ind(list1,list2) print result[1]
However I got the errors like:
ValueError: could not convert string to float: id
I'm confused by this. When I try this for only one line in interactive section, instead of for loop using script:
>>> from scipy import stats >>> import numpy as np >>> f=open('data2.txt','r').readlines() >>> w=f[1].split() >>> l1=w[1:8] >>> l2=w[8:15] >>> list1=[float(x) for x in l1] >>> list1 [5.3209183842, 4.6422726719, 4.3788135547, 5.9299061614, 5.9331108706, 5.0287087832, 4.57...]
I works well.
Can anyone explain a little bit about this? thx
回答1:
Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.
When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.
#!/usr/bin/python import os,sys from scipy import stats import numpy as np f=open('data2.txt', 'r').readlines() N=len(f)-1 for i in range(0,N): w=f[i].split() l1=w[1:8] l2=w[8:15] try: list1=[float(x) for x in l1] list2=[float(x) for x in l2] except ValueError,e: print "error",e,"on line",i result=stats.ttest_ind(list1,list2) print result[1]
回答2:
My error was very simple: the text file containing the data had some space (so not visible) character on the last line.
As an output of grep, I had instead of just 45
The classic stupid thing that makes you waste hours. :-)
回答3:
This error is pretty verbose:
ValueError: could not convert string to float: id
Somewhere in your text file, a line has the word id in it, which can't really be converted to a number.
Your test code works because the word id isn't present in line 2.
If you want to catch that line, try this code. I cleaned your code up a tad:
#!/usr/bin/python import os, sys from scipy import stats import numpy as np for index, line in enumerate(open('data2.txt', 'r').readlines()): w = line.split(' ') l1 = w[1:8] l2 = w[8:15] try: list1 = map(float, l1) list2 = map(float, l2) except ValueError: print 'Line {i} is corrupt!'.format(i = index)' break result = stats.ttest_ind(list1, list2) print result[1]
回答4:
Your data may not be what you expect -- it seems you're expecting, but not getting, floats.
A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:
for i in range(0,N): w=f[i].split() l1=w[1:8] l2=w[8:15] try: list1=[float(x) for x in l1] list2=[float(x) for x in l2] except ValueError, e: # report the error in some way that is helpful -- maybe print out i result=stats.ttest_ind(list1,list2) print result[1]
回答5:
Perhaps your numbers aren't actually numbers, but letters masquerading as numbers?
In my case, the font I was using meant that "l" and "1" looked very similar. I had a string like 'l1919' which I thought was '11919' and that messed things up.
回答6:
I got the same error while working with a .csv file with 69190 rows scrapped from amazon. I was trying to implement RNN.
When I looked carefully, a column which containing integers also had non-numeric values in some rows. I replaced that with numeric values and everything worked fine afterwards.
So first check your dataset for these errors first.