Im new to python and came across a code snippet.
df = df[~df[\'InvoiceNo\'].str.contains(\'C\')]
Would be much obliged if I could know what
It means bitwise not, inversing boolean mask - Falses to Trues and Trues to Falses.
Sample:
df = pd.DataFrame({'InvoiceNo': ['aaC','ff','lC'],
'a':[1,2,5]})
print (df)
InvoiceNo a
0 aaC 1
1 ff 2
2 lC 5
#check if column contains C
print (df['InvoiceNo'].str.contains('C'))
0 True
1 False
2 True
Name: InvoiceNo, dtype: bool
#inversing mask
print (~df['InvoiceNo'].str.contains('C'))
0 False
1 True
2 False
Name: InvoiceNo, dtype: bool
Filter by boolean indexing:
df = df[~df['InvoiceNo'].str.contains('C')]
print (df)
InvoiceNo a
1 ff 2
So output is all rows of DataFrame, which not contains C in column InvoiceNo.
It's used to invert boolean Series, see pandas-doc.
if you want to apply it to all columns in a data frame you can use
df.any(axis=1)
df = df[~(df>0).any(axis=1)]
This checks if val > 0 in all columns of the data frame and returns a Boolean.
Likewise, axis=0 is for index/row wise.