问题
My data is organized in a dataframe:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
Which looks like this (only much bigger):
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
My algorithm loops through this table rows and performs a set of operations.
For cleaness/lazyness sake, I would like to work on a single row at each iteration without typing df.loc['row index', 'column name']
to get each cell value
I have tried to follow the right style using for example:
row_of_interest = df.loc['R2', :]
However, I still get the warning when I do:
row_of_interest['Col2'] = row_of_interest['Col2'] + 1000
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
And it is not working (as I intended) it is making a copy
print df
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
Any advice on the proper way to do it? Or should I just stick to work with the data frame directly?
Edit 1:
Using the replies provided the warning is removed from the code but the original dataframe is not modified: The "row of interest" Series
is a copy not part of the original dataframe. For example:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
row_of_interest = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value
print row_of_interest
Col1 5
Col2 1020
Col3 50
Col4 BBB
Name: R2, dtype: object
print df
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
Edit 2:
This is an example of the functionality I would like to replicate. In python a list of lists looks like:
a = [[1,2,3],[4,5,6]]
Now I can create a "label"
b = a[0]
And if I change an entry in b:
b[0] = 7
Both a and b change.
print a, b
[[7,2,3],[4,5,6]], [7,2,3]
Can this behavior be replicated between a pandas dataframe labeling one of its rows a pandas series?
回答1:
This should work:
row_of_interest = df.loc['R2', :]
row_of_interest.is_copy = False
row_of_interest['Col2'] = row_of_interest['Col2'] + 1000
Setting .is_copy = False
is the trick
Edit 2:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
row_of_interest = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value
print row_of_interest
df.loc['R2'] = row_of_interest
print df
df:
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 1020 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
回答2:
You can remove the warning by creating a series with the slice you want to work on:
from pandas import Series
row_of_interest = Series(data=df.loc['R2', :])
row_of_interest.loc['Col2'] += 1000
print(row_of_interest)
Results in:
Col1 5
Col2 1020
Col3 50
Col4 BBB
Name: R2, dtype: object
回答3:
most straight forward way to do this
df.loc['R2', 'Col2'] += 1000
df
来源:https://stackoverflow.com/questions/40138090/work-with-a-row-in-a-pandas-dataframe-without-incurring-chain-indexing-not-copi