pandas

pandas: Dataframe.replace() with regex

和自甴很熟 提交于 2021-02-16 19:12:21
问题 I have a table which looks like this: df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-']))) A B 0 1.00 1.0 1 -1 -45.00 2 NaN - I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'. How can I ignore the negative values and replace only '-' to '0.00' ? my code: df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64) error code: ValueError: invalid

pandas: Dataframe.replace() with regex

心已入冬 提交于 2021-02-16 19:10:33
问题 I have a table which looks like this: df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-']))) A B 0 1.00 1.0 1 -1 -45.00 2 NaN - I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'. How can I ignore the negative values and replace only '-' to '0.00' ? my code: df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64) error code: ValueError: invalid

pandas: Dataframe.replace() with regex

房东的猫 提交于 2021-02-16 19:09:48
问题 I have a table which looks like this: df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-']))) A B 0 1.00 1.0 1 -1 -45.00 2 NaN - I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'. How can I ignore the negative values and replace only '-' to '0.00' ? my code: df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64) error code: ValueError: invalid

Pandas Vectorized lookup of Dictionary

余生颓废 提交于 2021-02-16 18:05:42
问题 This seems like it should be a common use case but I'm not finding any good guidance on this. I have a solution that works but I would rather have a vectorized lookup rather than using the Pandas apply() function. Here is an example of what I am doing: import pandas as pd example_dict = { "category1":{ "field1": 0.0, "filed2": 5.0}, "category2":{ "field1": 5.0, "field2": 8.0}} d = {"ids": range(10), "category": ["category1" if x % 2 == 0 else "category2" for x in range(10)]} df = pd.DataFrame

Problems with isin pandas

梦想与她 提交于 2021-02-16 17:56:42
问题 Sorry, I just asked this question: Pythonic Way to have multiple Or's when conditioning in a dataframe but marked it as answered prematurely because it passed my overly simplistic test case, but isn't working more generally. (If it is possible to merge and reopen the question that would be great...) Here is the full issue: sum(data['Name'].isin(eligible_players)) > 0 sum(data['Name'] == "Antonio Brown") > 68 "Antonio Brown" in eligible_players > True Basically if I understand correctly, I am

Problems with isin pandas

余生长醉 提交于 2021-02-16 17:56:11
问题 Sorry, I just asked this question: Pythonic Way to have multiple Or's when conditioning in a dataframe but marked it as answered prematurely because it passed my overly simplistic test case, but isn't working more generally. (If it is possible to merge and reopen the question that would be great...) Here is the full issue: sum(data['Name'].isin(eligible_players)) > 0 sum(data['Name'] == "Antonio Brown") > 68 "Antonio Brown" in eligible_players > True Basically if I understand correctly, I am

Pandas groupby for multiple values in a column

拈花ヽ惹草 提交于 2021-02-16 17:07:43
问题 I have a data frame similar to the following +----------------+-------+ | class | year | +----------------+-------+ | ['A', 'B'] | 2001 | | ['A'] | 2002 | | ['B'] | 2001 | | ['A', 'B', 'C']| 2003 | | ['B', 'C'] | 2001 | | ['C'] | 2003 | +----------------+-------+ I want to create a data frame using this so that the resulting table shows the count of each category in class per yer. +-----+----+----+----+ |year | A | B | C | +-----+----+----+----+ |2001 | 1 | 3 | 1 | |2002 | 1 | 0 | 0 | |2003 |

Pandas: Apply function to each pair of columns

故事扮演 提交于 2021-02-16 16:14:02
问题 Function f(x,y) that takes two Pandas Series and returns a floating point number. I would like to apply f to each pair of columns in a DataFrame D and construct another DataFrame E of the returned values, so that f(D[i],D[j]) is the value of the i th row and j th column. The straightforward solution is to run a nested loop over all pairs of columns: E = pd.DataFrame([[f(D[i], D[j]) for i in D] for j in D], columns=D.columns, index=D.columns) But is there a more elegant solution that perhaps

Pandas: Apply function to each pair of columns

前提是你 提交于 2021-02-16 16:12:04
问题 Function f(x,y) that takes two Pandas Series and returns a floating point number. I would like to apply f to each pair of columns in a DataFrame D and construct another DataFrame E of the returned values, so that f(D[i],D[j]) is the value of the i th row and j th column. The straightforward solution is to run a nested loop over all pairs of columns: E = pd.DataFrame([[f(D[i], D[j]) for i in D] for j in D], columns=D.columns, index=D.columns) But is there a more elegant solution that perhaps

Pandas: Apply function to each pair of columns

老子叫甜甜 提交于 2021-02-16 16:11:47
问题 Function f(x,y) that takes two Pandas Series and returns a floating point number. I would like to apply f to each pair of columns in a DataFrame D and construct another DataFrame E of the returned values, so that f(D[i],D[j]) is the value of the i th row and j th column. The straightforward solution is to run a nested loop over all pairs of columns: E = pd.DataFrame([[f(D[i], D[j]) for i in D] for j in D], columns=D.columns, index=D.columns) But is there a more elegant solution that perhaps