pandas

pandas filtering using isin function

二次信任 提交于 2021-02-11 04:32:59
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

pandas filtering using isin function

霸气de小男生 提交于 2021-02-11 04:32:22
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

pandas filtering using isin function

时光怂恿深爱的人放手 提交于 2021-02-11 04:31:11
问题 I have two dataframe as shown below df1: ID Name 1 Sachin 2 Kholi 3 Dravid df2: ID Run 1 20 2 60 2 10 1 5 From the above I want to filter df1 by only taking unique ids in df2: Expected output: ID Name 3 Dravid I tried below code def diff(first, second): second = set(second) units_in_unit_table = [item for item in first if item not in second] return units_in_unit_table id_df2 = diff(df2, df1) df3 = df1[df1['ID'].isin(id_df2)] 回答1: It seems your solution should be simplify by pass unique values

Json to CSV issues

拜拜、爱过 提交于 2021-02-11 02:49:13
问题 I am using pandas to normalize some json data. I am getting stuck on this issue when more than 1 section is either an object or an array. If i use the record_path on Car it breaks on the second. Any pointers on how to get something like this to create a line in the csv per Car and per Location? [ { "Name": "John Doe", "Car": [ "Car1", "Car2" ], "Location": "Texas" }, { "Name": "Jane Roe", "Car": "Car1", "Location": [ "Illinois", "Kansas" ] } ] Here is the output Name,Car,Location John Doe,"[

Json to CSV issues

夙愿已清 提交于 2021-02-11 02:45:31
问题 I am using pandas to normalize some json data. I am getting stuck on this issue when more than 1 section is either an object or an array. If i use the record_path on Car it breaks on the second. Any pointers on how to get something like this to create a line in the csv per Car and per Location? [ { "Name": "John Doe", "Car": [ "Car1", "Car2" ], "Location": "Texas" }, { "Name": "Jane Roe", "Car": "Car1", "Location": [ "Illinois", "Kansas" ] } ] Here is the output Name,Car,Location John Doe,"[

pandas: calculate time difference between df columns [duplicate]

北城以北 提交于 2021-02-11 02:31:50
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 6 months ago . I have two df columns with string values: df['starttime'] df['endtime'] 0 2015-10-06 18:35:33 0 2015-10-06 18:35:58 1 2015-10-08 17:51:21.999000 1 2015-10-08 17:52:10 2 2015-10-08 20:51:55.999000 2 2015-10-08 20:52:21 3 2015-10-05 15:16:49.999000 3 2015-10-05 15:17:00 4 2015-10-05 15:16:53.999000 4 2015-10-05 15:17:22 5 2015-10-05 15:17

pandas: calculate time difference between df columns [duplicate]

别来无恙 提交于 2021-02-11 02:23:56
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 6 months ago . I have two df columns with string values: df['starttime'] df['endtime'] 0 2015-10-06 18:35:33 0 2015-10-06 18:35:58 1 2015-10-08 17:51:21.999000 1 2015-10-08 17:52:10 2 2015-10-08 20:51:55.999000 2 2015-10-08 20:52:21 3 2015-10-05 15:16:49.999000 3 2015-10-05 15:17:00 4 2015-10-05 15:16:53.999000 4 2015-10-05 15:17:22 5 2015-10-05 15:17

pandas: calculate time difference between df columns [duplicate]

孤街浪徒 提交于 2021-02-11 02:23:22
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 6 months ago . I have two df columns with string values: df['starttime'] df['endtime'] 0 2015-10-06 18:35:33 0 2015-10-06 18:35:58 1 2015-10-08 17:51:21.999000 1 2015-10-08 17:52:10 2 2015-10-08 20:51:55.999000 2 2015-10-08 20:52:21 3 2015-10-05 15:16:49.999000 3 2015-10-05 15:17:00 4 2015-10-05 15:16:53.999000 4 2015-10-05 15:17:22 5 2015-10-05 15:17

Why is pandas.read_fwf not skipping the blank line as instructed?

二次信任 提交于 2021-02-11 01:55:15
问题 I'm reading a fixed width format (full source file) full of missing data, so pandas.read_fwf comes in handy. There is an empty line after the header, so I'm passing skip_blank_lines=True , but this appears to have no effect, as the first entry is still full of NaN/NaT: import io import pandas s="""USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END 007018 99999 WXPOD 7018 +00.000 +000.000 +7018.0 20110309 20130730 007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822

Pandas merge without duplicating columns

≡放荡痞女 提交于 2021-02-10 23:36:26
问题 I need to merge two dataframes without creating duplicate columns. The first datframe (dfa) has missing values. The second dataframe (dfb) has unique values. This would be the same as a vlookup in Excel. dfa looks like this: postcode lat lon ...plus 32 more columns M20 2.3 0.2 LS1 NaN NaN LS1 NaN NaN LS2 NaN NaN M21 2.4 0.3 dfb only contains unique Postcodes and values where lat and lon were NaN in dfa. It looks like this: postcode lat lon LS1 1.4 0.1 LS2 1.5 0.2 The output I would like is: