Work with a row in a pandas dataframe without incurring chain indexing (not coping just indexing)

人走茶凉 提交于 2019-12-24 08:33:23

问题


My data is organized in a dataframe:

import pandas as pd
import numpy as np

data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}

df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])

Which looks like this (only much bigger):

    Col1  Col2  Col3 Col4
R1     4    10   100  AAA
R2     5    20    50  BBB
R3     6    30   -30  AAA
R4     7    40   -50  CCC

My algorithm loops through this table rows and performs a set of operations.

For cleaness/lazyness sake, I would like to work on a single row at each iteration without typing df.loc['row index', 'column name'] to get each cell value

I have tried to follow the right style using for example:

row_of_interest = df.loc['R2', :]

However, I still get the warning when I do:

row_of_interest['Col2'] = row_of_interest['Col2'] + 1000

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

And it is not working (as I intended) it is making a copy

print df

    Col1  Col2  Col3 Col4
R1     4    10   100  AAA
R2     5    20    50  BBB
R3     6    30   -30  AAA
R4     7    40   -50  CCC

Any advice on the proper way to do it? Or should I just stick to work with the data frame directly?

Edit 1:

Using the replies provided the warning is removed from the code but the original dataframe is not modified: The "row of interest" Series is a copy not part of the original dataframe. For example:

import pandas as pd
import numpy as np

data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}

df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])

row_of_interest         = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value          = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value

print row_of_interest 

Col1       5
Col2    1020
Col3      50
Col4     BBB
Name: R2, dtype: object

print df

    Col1  Col2  Col3 Col4
R1     4    10   100  AAA
R2     5    20    50  BBB
R3     6    30   -30  AAA
R4     7    40   -50  CCC

Edit 2:

This is an example of the functionality I would like to replicate. In python a list of lists looks like:

a = [[1,2,3],[4,5,6]]

Now I can create a "label"

b = a[0]

And if I change an entry in b:

b[0] = 7

Both a and b change.

print a, b

[[7,2,3],[4,5,6]], [7,2,3]

Can this behavior be replicated between a pandas dataframe labeling one of its rows a pandas series?


回答1:


This should work:

row_of_interest = df.loc['R2', :]
row_of_interest.is_copy = False
row_of_interest['Col2'] = row_of_interest['Col2'] + 1000

Setting .is_copy = False is the trick

Edit 2:

import pandas as pd
import numpy as np

data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}

df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])

row_of_interest         = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value          = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value

print row_of_interest 

df.loc['R2'] = row_of_interest 

print df

df:

    Col1  Col2  Col3 Col4
R1     4    10   100  AAA
R2     5  1020    50  BBB
R3     6    30   -30  AAA
R4     7    40   -50  CCC



回答2:


You can remove the warning by creating a series with the slice you want to work on:

from pandas import Series
row_of_interest = Series(data=df.loc['R2', :])
row_of_interest.loc['Col2'] += 1000
print(row_of_interest)

Results in:

Col1       5
Col2    1020
Col3      50
Col4     BBB
Name: R2, dtype: object



回答3:


most straight forward way to do this

df.loc['R2', 'Col2'] += 1000
df



来源:https://stackoverflow.com/questions/40138090/work-with-a-row-in-a-pandas-dataframe-without-incurring-chain-indexing-not-copi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!