pandas

Python pandas df.copy() ist not deep

依然范特西╮ 提交于 2021-02-11 15:46:36
问题 I have (in my opinion) a strange problem with python pandas. If I do: cc1 = cc.copy(deep=True) for the dataframe cc and than ask a certain row and column: print(cc1.loc['myindex']['data'] is cc.loc['myindex']['data']) I get True What's wrong here? 回答1: Deep copying doesn't work in pandas and the devs consider putting mutable objects inside a DataFrame as an antipattern There is nothing wrong in your code, just in case if you want to know the difference with some example of deep and shallow

python: cumulative concatenate in pandas dataframe

只谈情不闲聊 提交于 2021-02-11 15:45:07
问题 How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python. Here is the problem: suppose we have a dataframe: with columns: date and name : import pandas as pd d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']} df = pd.DataFrame(data=d) I want to get CUM_CONCAT , which is a cumulative concatenate groupby date: date name CUM_CONCAT 0 1 A [A] 1 1 B [A,B] 2 2 A [A] 3 2 C [A,C] 4 3 A [A] 5 3

python: cumulative concatenate in pandas dataframe

五迷三道 提交于 2021-02-11 15:44:51
问题 How to do a cumulative concatenate in pandas dataframe? I found there are a number of solutions in R, but can't find it in python. Here is the problem: suppose we have a dataframe: with columns: date and name : import pandas as pd d = {'date': [1,1,2,2,3,3,3,4,4,4], 'name':['A','B','A','C','A','B','B','A','B','C']} df = pd.DataFrame(data=d) I want to get CUM_CONCAT , which is a cumulative concatenate groupby date: date name CUM_CONCAT 0 1 A [A] 1 1 B [A,B] 2 2 A [A] 3 2 C [A,C] 4 3 A [A] 5 3

how to identify whats NOT in the inner join while merging 3 data frames

隐身守侯 提交于 2021-02-11 15:40:25
问题 I have got 3 data frames: energy, GDP & ScimEn. All the data frames have a column 'Country' and I merged all 3 data frames while using inner join: a = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='inner') b = pd.merge(a,ScimEn,left_on='Country',right_on='Country',how='inner') Now, I want to figure out the number of countries which were left out of this merge. I tried the following formula, but it's giving me an error "ValueError: Cannot use name of an existing column for

how to identify whats NOT in the inner join while merging 3 data frames

坚强是说给别人听的谎言 提交于 2021-02-11 15:38:23
问题 I have got 3 data frames: energy, GDP & ScimEn. All the data frames have a column 'Country' and I merged all 3 data frames while using inner join: a = pd.merge(energy,GDP,left_on='Country',right_on='Country',how='inner') b = pd.merge(a,ScimEn,left_on='Country',right_on='Country',how='inner') Now, I want to figure out the number of countries which were left out of this merge. I tried the following formula, but it's giving me an error "ValueError: Cannot use name of an existing column for

What is the difference between bins when using groupby apply vs resample apply?

孤街浪徒 提交于 2021-02-11 15:37:54
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

What is the difference between bins when using groupby apply vs resample apply?

人走茶凉 提交于 2021-02-11 15:34:32
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

What is the difference between bins when using groupby apply vs resample apply?

浪尽此生 提交于 2021-02-11 15:34:28
问题 This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01

How to select elements from subsequent numpy arrays stored in pandas series

南笙酒味 提交于 2021-02-11 15:31:42
问题 I've got a Series of numpy arrays: import pandas as pd import numpy as np pd.Series({10: np.array([[0.72260683, 0.27739317, 0. ], [0.7187053 , 0.2812947 , 0. ], [0.71435467, 0.28564533, 1. ], [0.3268072 , 0.6731928 , 0. ], [0.31941951, 0.68058049, 1. ], [0.31260015, 0.68739985, 0. ]]), 20: np.array([[0.7022099 , 0.2977901 , 0. ], [0.6983866 , 0.3016134 , 0. ], [0.69411673, 0.30588327, 1. ], [0.33857735, 0.66142265, 0. ], [0.33244109, 0.66755891, 1. ], [0.32675582, 0.67324418, 0. ]]), 38: np

Create a dict of list using python from csv

柔情痞子 提交于 2021-02-11 15:31:37
问题 I have a csv file with data as below XPATH,ColumName,CSV_File_Name,ParentKey /integration-outbound:IntegrationEntity/integrationEntityDetails/supplier/forms/form[]/id,id,integrationEntityDetailsForms.csv, /integration-outbound:IntegrationEntity/integrationEntityHeader/attachments/attachment[]/id,aid,integrationEntityDetailsForms.csv, /integration-outbound:IntegrationEntity/integrationEntityDetails/supplier/forms/form[]/records/record[]/Internalid,Internalid,integrationEntityDetailsForms.csv,