pandas

Drop duplicate if the value in another column is null - Pandas

☆樱花仙子☆ 提交于 2021-02-09 09:21:01
问题 What I have: df Name |Vehicle Dave |Car Mark |Bike Steve|Car Dave | Steve| I want to drop duplicates from the Name column but only if the corresponding value in Vehicle column is null. I know I can use df.dropduplicates(subset=['Name']) with either Keep = either 'First' or 'Last' but what I am looking for is a way to drop duplicates from Name column where the corresponding value of Vehicle column is null . So basically, keep the Name if the Vehicle column is NOT null and drop the rest. If a

Index sort order of a multi-index dataframe does not respect categorical index order

断了今生、忘了曾经 提交于 2021-02-09 08:34:57
问题 A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'. import pandas as pd df = pd.DataFrame({'A':[1,1,2,2], 'B':['One','Two','Three', 'Four'], 'X':[1,2,3,4]}, index=range(4)).set_index(['A','B']).sort_index() df X A B 1 One 1 Two 2 2 Four 4 Three 3 Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct

Index sort order of a multi-index dataframe does not respect categorical index order

北战南征 提交于 2021-02-09 08:33:21
问题 A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'. import pandas as pd df = pd.DataFrame({'A':[1,1,2,2], 'B':['One','Two','Three', 'Four'], 'X':[1,2,3,4]}, index=range(4)).set_index(['A','B']).sort_index() df X A B 1 One 1 Two 2 2 Four 4 Three 3 Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct

Index sort order of a multi-index dataframe does not respect categorical index order

巧了我就是萌 提交于 2021-02-09 08:32:29
问题 A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'. import pandas as pd df = pd.DataFrame({'A':[1,1,2,2], 'B':['One','Two','Three', 'Four'], 'X':[1,2,3,4]}, index=range(4)).set_index(['A','B']).sort_index() df X A B 1 One 1 Two 2 2 Four 4 Three 3 Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct

Index sort order of a multi-index dataframe does not respect categorical index order

本秂侑毒 提交于 2021-02-09 08:31:53
问题 A small dataframe with a two level multiindex and one column. The second column(level 1) of the index will sort in alphabetical order putting 'Four' before 'Three'. import pandas as pd df = pd.DataFrame({'A':[1,1,2,2], 'B':['One','Two','Three', 'Four'], 'X':[1,2,3,4]}, index=range(4)).set_index(['A','B']).sort_index() df X A B 1 One 1 Two 2 2 Four 4 Three 3 Clearly the second level of the index (B) is in alphabetical order so this can be replaced with a categorical index to force the correct

NumPy: how to left join arrays with duplicates

冷暖自知 提交于 2021-02-09 07:36:57
问题 To use Cython, I need to convert df1.merge(df2, how='left') (using Pandas ) to plain NumPy , while I found numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter') doesn't support any duplicates along key . Is there any way to solve it? 回答1: Here's a stab at a pure numpy left join that can handle duplicate keys: import numpy as np def join_by_left(key, r1, r2, mask=True): # figure out the dtype of the result array descr1 = r1.dtype.descr descr2 = [d for d in r2.dtype.descr if d[0]

NumPy: how to left join arrays with duplicates

寵の児 提交于 2021-02-09 07:36:51
问题 To use Cython, I need to convert df1.merge(df2, how='left') (using Pandas ) to plain NumPy , while I found numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter') doesn't support any duplicates along key . Is there any way to solve it? 回答1: Here's a stab at a pure numpy left join that can handle duplicate keys: import numpy as np def join_by_left(key, r1, r2, mask=True): # figure out the dtype of the result array descr1 = r1.dtype.descr descr2 = [d for d in r2.dtype.descr if d[0]

Aggregate, Transpose, and pull in value in Pandas Dataframe

随声附和 提交于 2021-02-09 07:24:06
问题 Input DF: ID Time Value 0 1 5 0 2 7 0 3 8 1 1 1 1 2 4 1 3 6 Output DF: 1 2 3 0 5 7 8 1 1 4 6 Goal: I currently have something similar to the input DF and am looking to transform it into the output DF. Row 1 of the output DF is equal to the unique time data points. Column 1 of the output DF is equal to the unique IDs. The remaining center points are equal the Value element given the id/time the closest I've gotten is by doing something like this: group_by = input_df.groupby('ID').agg({'Value'

Pivoting a Pandas Dataframe, no numeric types, index is not unique

久未见 提交于 2021-02-09 07:20:21
问题 I am trying to convert some string data into columns, but have had a difficult time utilizing past responses because I do not have a unique index or multi-index that I could use. Sample format index location field value 1 location1 firstName A 2 location1 lastName B 3 location1 dob C 4 location1 email D 5 location1 title E 6 location1 address1 F 7 location1 address2 G 8 location1 address3 H 9 location1 firstName I 10 location1 lastName J 11 location1 dob K 12 location1 email L 13 location1

Pivoting a Pandas Dataframe, no numeric types, index is not unique

丶灬走出姿态 提交于 2021-02-09 07:01:42
问题 I am trying to convert some string data into columns, but have had a difficult time utilizing past responses because I do not have a unique index or multi-index that I could use. Sample format index location field value 1 location1 firstName A 2 location1 lastName B 3 location1 dob C 4 location1 email D 5 location1 title E 6 location1 address1 F 7 location1 address2 G 8 location1 address3 H 9 location1 firstName I 10 location1 lastName J 11 location1 dob K 12 location1 email L 13 location1