问题
I am using the following code,
if(df.month == 3 or df.month == 4 or df.month == 5):
df.test = 'A'
elif(df.month == 6 or df.month == 7 or df.month == 8):
df.test = 'B'
else:
df.test = 'C'
But while using this, I am getting the following error,
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Update:
print df.columns
Unnamed: 0 int64
year int64
month int64
day int64
dep_time float64
dep_delay float64
arr_time float64
arr_delay float64
carrier object
tailnum object
flight int64
origin object
dest object
air_time float64
distance int64
hour float64
minute float64
print df.dtypes
dtype: object
Can anybody help me in finding the error here?
回答1:
I think the best is use loc and isin, because you can't compare a scalar with an array like that using if
or elif
it becomes ambiguous:
print df
year month day
0 2005 3 20
1 2005 4 20
2 2005 5 20
3 2005 6 20
4 2005 7 20
5 2005 8 20
6 2005 9 20
df['test'] = 'C'
df.loc[df['month'].isin([3,4,5]) , 'test'] = 'A'
df.loc[df['month'].isin([6,7,8]) , 'test'] = 'B'
print df
year month day test
0 2005 3 20 A
1 2005 4 20 A
2 2005 5 20 A
3 2005 6 20 B
4 2005 7 20 B
5 2005 8 20 B
6 2005 9 20 C
Or you can fill column test
by value C
this way:
df.loc[df['month'].isin([3,4,5]) , 'test'] = 'A'
df.loc[df['month'].isin([6,7,8]) , 'test'] = 'B'
df.loc[df['month'].isin([1,2,9,10,11,12]) , 'test'] = 'C'
print df
year month day test
0 2005 3 20 A
1 2005 4 20 A
2 2005 5 20 A
3 2005 6 20 B
4 2005 7 20 B
5 2005 8 20 B
6 2005 9 20 C
回答2:
Try
def valuesetter(x):
if x in [3,4,5]: return "A"
elif x in [6,7,8]: return "B"
else: return "C"
df["test"] = list(map(valuesetter,df.month))
回答3:
The exception message you're getting is pretty self explanatory. df['month'] is a series, and the truth value of a series is ambiguous because it represents a series of truth values. You can do what you're trying to do with pd.Series.map
def assignmentFunction(value):
if value in [3, 4, 5]:
return 'A'
elif value in [6, 7, 8]:
return 'B'
else:
return 'C'
df['test'] = df['month'].map(assignmentFunction)
回答4:
You can use a comprehension to create your test
column:
>>> df = pd.DataFrame({'month' : pd.Series(range(1,13))})
>>> df['test'] = ['A' if m in [3,4,5] else
... 'B' if m in [6,7,8] else
... 'C' for m in df['month']]
>>> df
month test
0 1 C
1 2 C
2 3 A
3 4 A
4 5 A
5 6 B
6 7 B
7 8 B
8 9 C
9 10 C
10 11 C
11 12 C
Or you can apply a function, which produces the same result:
>>> def value(month):
... if month in [3,4,5]:
... return 'A'
... if month in [6,7,8]:
... return 'B'
... return 'C'
>>> df['test'] = df['month'].apply(value)
回答5:
This answer mainly tries to explain the error that you're seeing. As I'm not a pandas
user, I'll let the other answers speak to better ways to write this code...
df.month
returns an array. some_array == 6
will return another array (constructed such that new_array[i] == True
iff some_array[i] == 6
).
Because of situations like this, in numpy, an array does not have a truth value (unlike normal python sequences). So, to test if an array is truthy, you need to specify what you mean. e.g. to specify that all elements must be truthy, you'd want: (df.month == 6).all()
来源:https://stackoverflow.com/questions/34759318/creating-new-column-using-output-of-if-else-statement-causes-error