Set value to an entire column of a pandas dataframe

前端 未结 8 2051
甜味超标
甜味超标 2020-12-13 05:54

I\'m trying to set the entire column of a dataframe to a specific value.

In  [1]: df
Out [1]: 
     issueid   industry
0        001        xxx
1        002           


        
相关标签:
8条回答
  • 2020-12-13 06:32

    You can use the assign function:

    df = df.assign(industry='yyy')
    
    0 讨论(0)
  • 2020-12-13 06:35

    I had a similar issue before even with this approach df.loc[:,'industry'] = 'yyy', but once I refreshed the notebook, it ran well.

    You may want to try refreshing the cells after you have df.loc[:,'industry'] = 'yyy'.

    0 讨论(0)
  • 2020-12-13 06:36

    Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of df = df_all.loc[df_all['issueid']==specific_id,:]. In this case, df is really just a stand-in for the rows stored in the df_all object: a new object is NOT created in memory.

    To avoid these issues altogether, I often have to remind myself to use the copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the deepcopy function.

    In your case, this should get rid of the warning message:

    from copy import deepcopy
    df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:])
    df['industry'] = 'yyy'
    

    EDIT: Also see David M.'s excellent comment below!

    df = df_all.loc[df_all['issueid']==specific_id,:].copy()
    df['industry'] = 'yyy'
    
    0 讨论(0)
  • 2020-12-13 06:37

    Assuming your Data frame is like 'Data' you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.

    import pandas as pd
    
    data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')]
    
    df = pd.DataFrame(data,columns=['issueid', 'industry'])
    
    print("Old DataFrame")
    print(df)
    
    df.loc[:,'industry'] = str('yyy')
    
    print("New DataFrame")
    print(df)
    

    Now if want to put numbers instead of letters you must create and array

    list_of_ones = [1,1,1,1,1]
    df.loc[:,'industry'] = list_of_ones
    print(df)
    

    Or if you are using Numpy

    import numpy as np
    n = len(df)
    df.loc[:,'industry'] = np.ones(n)
    print(df)
    
    0 讨论(0)
  • 2020-12-13 06:38

    You can do :

    df['industry'] = 'yyy'
    
    0 讨论(0)
  • 2020-12-13 06:39

    Seems to me that:

    df1 = df[df['col1']==some_value] WILL NOT create a new DataFrame, basically, changes in df1 will be reflected in the parent df. This leads to the warning. Whereas, df1 = df[df['col1]]==some_value].copy() WILL create a new DataFrame, and changes in df1 will not be reflected in df. the copy() method is recommended if you don't want to make changes to your original df.

    0 讨论(0)
提交回复
热议问题