pandas

panda add several new columns based on values from other columns at the same time?

*爱你&永不变心* 提交于 2021-02-05 07:45:06
问题 How to add several new columns based on values from other columns at the same time ? I only found examples to add a row one at a time. I am able to add 3 new columns but this does not seem efficient since it has to go through all the rows 3 times. Is there a way to traverse the DF once? import pandas as pd from decimal import Decimal d = [ {'A': 2, 'B': Decimal('628.00')}, {'A': 1, 'B': Decimal('383.00')}, {'A': 3, 'B': Decimal('651.00')}, {'A': 2, 'B': Decimal('575.00')}, {'A': 4, 'B':

How to delete text before a specific character - Python (Pandas)

人盡茶涼 提交于 2021-02-05 07:43:14
问题 I have a column in a larger dataset that looks like: Name ---- Mr. John Doe Jack Daw Prof. Charles Winchester Jane Shaw ... etc. (Names anonymized) Basically, its a list of names that have prefixes mixed in. All prefixes end with a dot. So far, the prefixes have been limited to: Mr. Mrs. Ms. Dr. and Prof. The output I would like is: Name ---- John Doe Jack Daw Charles Winchester Jane Shaw ... etc. Ideally, I would like a solution that relies on the position of the dot instead of having to

Can I combine groupby data?

感情迁移 提交于 2021-02-05 07:42:44
问题 I have two columns home and away. So one row will be England vs Brazil and the next row will be Brazil England. How can I count occurrences of when Brazil faces England or England vs Brazil in one count? Based on previous solutions, I have tried results.groupby(["home_team", "away_team"]).size() results.groupby(["away_team", "home_team"]).size() however this does not give me the outcome that I am looking for. Undesired output: home_team away_team England Brazil 1 away_team home_team Brazil

Python pandas splitting text and numbers in dataframe

夙愿已清 提交于 2021-02-05 07:42:24
问题 I have a dataframe df1 with column name Acc Number as the first column and the data looks like: Acc Number ASC100.1 MJT122 ASC120.4 XTY111 I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is: Text Number ASC 100.1 MJT 122 ASC 100.4 XTY 111 How would I go about doing this? Thanks! 回答1: You could do something like this: import pandas as pd data = ['ASC100.1', 'MJT122', 'ASC120.4', 'XTY111'] df = pd

pandas concat/merge/join multiple dataframes with only one column by this column

大城市里の小女人 提交于 2021-02-05 07:33:59
问题 I have (more than) two dataframes: In [22]: df = pd.DataFrame({'database' : ['db1', 'db2', 'db3']}) In [23]: df1 = pd.DataFrame({'database' : ['db1', 'db2', 'db3']}) In [24]: df2 = pd.DataFrame({'database' : ['db2', 'db3', 'db4']}) In [25]: df1 Out[25]: database 0 db1 1 db2 2 db3 In [26]: df2 Out[26]: database 0 db2 1 db3 2 db4 What I want as output is dataframe in this format: Out[45]: database database 0 db1 1 db2 db2 2 db3 db3 3 db4 I manage to get it in this format like this: df1.index =

Get column names for the N Max/Min values per row in Pandas

岁酱吖の 提交于 2021-02-05 07:29:28
问题 I am trying to get, for each individual row, the name of the column with the max/min value up to N-values. Given something like this: a b c d e 1.2 2 0.1 0.8 0.01 2.1 1.1 3.2 4.6 3.4 0.2 1.9 8.8 0.3 1.3 3.3 7.8 0.12 3.2 1.4 I can get the max with idxmax(axis=1) and so on the min with idxmin(axis=1) but this only works for the top-max and bottom-min, not generalizable for N-values. I want to get, if called with N=2: a b c d e Max1 Max2 Min1 Min2 1.2 2.0 0.1 0.8 0.1 b a c e 2.1 1.1 3.2 4.6 3.4

Get column names for the N Max/Min values per row in Pandas

佐手、 提交于 2021-02-05 07:28:28
问题 I am trying to get, for each individual row, the name of the column with the max/min value up to N-values. Given something like this: a b c d e 1.2 2 0.1 0.8 0.01 2.1 1.1 3.2 4.6 3.4 0.2 1.9 8.8 0.3 1.3 3.3 7.8 0.12 3.2 1.4 I can get the max with idxmax(axis=1) and so on the min with idxmin(axis=1) but this only works for the top-max and bottom-min, not generalizable for N-values. I want to get, if called with N=2: a b c d e Max1 Max2 Min1 Min2 1.2 2.0 0.1 0.8 0.1 b a c e 2.1 1.1 3.2 4.6 3.4

Pandas Dataframe splitted into weeks

时间秒杀一切 提交于 2021-02-05 07:24:04
问题 I have problems reshaping a dataframe into weeks, such that I'm able to look at one particular week easy, but also aggregated week-days together, i.e. Monday + Monday, Tuesday + Tuesday, etc. I have looked in the documentation for an approach, but I have not been able to find a solution that works for me. My data has a resolution of 1 min and a duration of 4 months, and the series has missing data at some locations. Currently I have come up with something like: def week_reshaping(df): #

Create Dataframe from a nested dictionary

余生长醉 提交于 2021-02-05 07:22:25
问题 I am trying to create a dataframe from a list of values which has nested dictionaries So this is my data d=[{'user': 200, 'p_val': {'a': 10, 'b': 200}, 'f_val': {'a': 20, 'b': 300}, 'life': 8}, {'user': 202, 'p_val': {'a': 100, 'b': 200}, 'f_val': {'a': 200, 'b': 300}, 'life': 8}] i am trying to turn it into a dataframe as follows: user new_col f_val p_val life 200 a 20 10 8 200 b 300 200 8 202 a 200 100 8 202 b 300 200 8 I looked at other answers, none of them matched my requirement. The

How to use .loc to set as other column values in pandas

╄→гoц情女王★ 提交于 2021-02-05 07:15:06
问题 For example, I have a dataframe: cond value1 value2 0 True 1 1 1 False 3 5 2 True 34 2 3 True 23 23 4 False 4 2 I hope to replace value1 to value2*2 when cond=True . So I want the result is: cond value1 value2 0 True 2 1 1 False 3 5 2 True 4 2 3 True 46 23 4 False 4 2 I can achieve it by follow code: def convert(x): if x.cond: x.value1= x.value2*2 return x data = data.apply(lambda x: convert(x),axis=1) I think it is so slow when data is big. I try it by .loc , but I don't know how to set