remove rows and ValueError Arrays were different lengths

本小妞迷上赌 提交于 2019-12-10 16:17:17

问题


My dataframe has subcategory, under each category (cat, dog, bird), stats information is presented. I need to remove the rows if they contain info in count and freq, and only keep rows with sd and mean values. Some values are NaN.

ValueError occurs in my codes.

df:

 var    stats    A     B     C
 cat     mean    2     3     4
 NaN     sd      2     1     3
 NaN     count   5     2     6
 NaN     freq    3     1     19
 dog     mean    8     1     2
 NaN     sd      2     1     3
 NaN     count   4     6     1
 NaN     freq    3     1     19   
 bird    mean    2     3     4
 NaN     sd      2     1     3
 NaN     count   5     2     6
 NaN     freq    NaN   NaN   NaN 

My codes:

rows = ['count', 'freq']
df = [df.stats != rows]

Expected outcome

 var    stats    A     B     C
 cat     mean    2     3     4
 NaN     sd      2     1     3
 dog     mean    8     1     2
 NaN     sd      2     1     3   
 bird    mean    2     3     4
 NaN     sd      2     1     3

error:

File "pandas/_libs/lib.pyx", line 805, in pandas._libs.lib.vec_compare 
(pandas/_libs/lib.c:14288)
ValueError: Arrays were different lengths: 819 vs 9

I am not sure how to check the array length, but in my excel spreadsheet, all columns and rows have the same length. Is this error caused by NaN/empty cell in my data?

Thanks!


回答1:


!= will not work here. Use pd.Series.isin to obtain a mask you'll then use to filter your dataframe.

m = ~df.stats.isin(['count', 'freq'])
print(m)
0      True
1      True
2     False
3     False
4      True
5      True
6     False
7     False
8      True
9      True
10    False
11    False
Name: stats, dtype: bool

print(df[m])
    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   NaN    sd  2.0  1.0  3.0
4   dog  mean  8.0  1.0  2.0
5   NaN    sd  2.0  1.0  3.0
8  bird  mean  2.0  3.0  4.0
9   NaN    sd  2.0  1.0  3.0



回答2:


you can use SQL-like query() method:

In [163]: df.query("stats not in ['count','freq']")
Out[163]:
    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   NaN    sd  2.0  1.0  3.0
4   dog  mean  8.0  1.0  2.0
5   NaN    sd  2.0  1.0  3.0
8  bird  mean  2.0  3.0  4.0
9   NaN    sd  2.0  1.0  3.0

or using your rows variable:

In [164]: df.query("stats not in @rows")
Out[164]:
    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   NaN    sd  2.0  1.0  3.0
4   dog  mean  8.0  1.0  2.0
5   NaN    sd  2.0  1.0  3.0
8  bird  mean  2.0  3.0  4.0
9   NaN    sd  2.0  1.0  3.0



回答3:


For fun!

rows = ['count', 'freq']

df.merge(pd.DataFrame(dict(stats=np.setdiff1d(df.stats, rows))))

    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   dog  mean  8.0  1.0  2.0
2  bird  mean  2.0  3.0  4.0
3   NaN    sd  2.0  1.0  3.0
4   NaN    sd  2.0  1.0  3.0
5   NaN    sd  2.0  1.0  3.0

Another interesting way with index and drop

df.set_index('stats').drop(rows).reset_index()

  stats   var    A    B    C
0  mean   cat  2.0  3.0  4.0
1    sd   NaN  2.0  1.0  3.0
2  mean   dog  8.0  1.0  2.0
3    sd   NaN  2.0  1.0  3.0
4  mean  bird  2.0  3.0  4.0
5    sd   NaN  2.0  1.0  3.0



回答4:


LOL :)

df[[x not in rows for x in df.stats]]
Out[520]: 
    var stats    A    B    C
0   cat  mean  2.0  3.0  4.0
1   NaN    sd  2.0  1.0  3.0
4   dog  mean  8.0  1.0  2.0
5   NaN    sd  2.0  1.0  3.0
8  bird  mean  2.0  3.0  4.0
9   NaN    sd  2.0  1.0  3.0


来源:https://stackoverflow.com/questions/46655712/remove-rows-and-valueerror-arrays-were-different-lengths

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!