dataframe | 易学教程

How to insert missing dates and forward fill columns after grouping by another column in pandas dataframe

阅读更多关于 How to insert missing dates and forward fill columns after grouping by another column in pandas dataframe

问题 I have data available on a monthly basis(for different securities) which I want to convert to a daily basis by adding the missing dates and forward filling the monthly data for all the days of the month(i.e. data on 12/3/2015 = data on 12/1/2015 and so on for all securities). My data looks like this: x = pd.DataFrame({'ticker': ['a','a','a','b','b'], 'dt': ['12/1/2015','1/1/2016','2/1/2016','1/1/2016','2/1/2016'], 'score': [2.8,3.8,3.8,1.9,1.7]}) I tried creating a multi-index using dates and

How to insert missing dates and forward fill columns after grouping by another column in pandas dataframe

阅读更多关于 How to insert missing dates and forward fill columns after grouping by another column in pandas dataframe

DataFrame: if value in a cell, copy value to cells below it

阅读更多关于 DataFrame: if value in a cell, copy value to cells below it

问题 I'm working on a stock analysis program and need to find 'SPLIT' amounts from the 'UNP_action', and then copy the corresponding 'UNP_action_amount' to rows above it only. I'm able to do this in a complicated way via loops, but I'm wondering if there's a more efficient way to do this within Pandas. Current: Date UNP_Adj_Close UNP_action UNP_action_amount 2008-05-23 31.83157 2008-05-27 33.032365 2008-05-28 32.965423 2008-05-29 33.61812 SPLIT 0.5 2008-05-30 34.438176 Desired: Date UNP_Adj_Close

Expand nested list of dictionaries in a pandas dataframe column

阅读更多关于 Expand nested list of dictionaries in a pandas dataframe column

问题 I have this dataframe called "leads" I got from saving the output of an SFDC SOQL into a dataframe. I have been trying to expand column "Leads__r.record" Company Month Amount Leads__r.done Leads__r.record Leads__r.totalSize 0 A1 September 500000 True [{u'Id': u'Q500, u'Company': u'... 1.0 1 B1 December 16200 True [{u'Id': u'Q600', u'Company': u'... 1.0 2 C1 December 35000 True [{u'Id': u'Q700', u'Company': u'... 1.0 3 D1 December 16200 True [{u'Id': u'Q800', u'Company': u'... 1.0 4 E1

How to count observations with certain value in a group conditionally?

阅读更多关于 How to count observations with certain value in a group conditionally?

问题 I am working with the following data frame: Year Month Day X Y Color 2018 January 1 4.5 6 Red 2018 January 4 3.2 8.1 Red 2018 January 11 1.1 2.3 Blue 2018 February 7 5.4 2.2 Blue 2018 February 15 1.5 4.4 Red 2019 January 3 8.6 2.3 Red 2019 January 22 1.1 2.5 Blue 2019 January 23 5.5 7.8 Red 2019 February 5 6.9 1.1 Red 2019 February 10 1.8 1.3 Red I am looking to create a new column that indicates the number of observations where x is greater than y and the color is 'red' for a given month.

R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

阅读更多关于 R: Is there a way to sort messy data where it pivots from long to wide, and as it moves across variables, into one logical key:value column?

问题 I have extremely messy data. A portion of it looks like the following example. x1_01=c("bearing_coordinates", "bearing_coordinates", "bearing_coordinates", "roadkill") x1_02=c(146,122,68,1) x2_01=c("tree_density","animals_on_road","animals_on_road", "tree_density") x2_02=c(13,2,5,11) x3_01=c("animals_on_road", "tree_density", "roadkill", "bearing_coordinates") x3_02=c(3,10,1,1000) x4_01=c("roadkill","roadkill", "tree_density", "animals_on_road") x4_02=c(1,1,12,6) testframe = data.frame(x1_01

List of dict of dict in Pandas

阅读更多关于 List of dict of dict in Pandas

问题 I have list of dict of dicts in the following form: [{0:{'city':'newyork', 'name':'John', 'age':'30'}}, {0:{'city':'newyork', 'name':'John', 'age':'30'}},] I want to create pandas DataFrame in the following form: city name age newyork John 30 newyork John 30 Tried a lot but without any success can you help me? 回答1: Use list comprehension with concat and DataFrame.from_dict: L = [{0:{'city':'newyork', 'name':'John', 'age':'30'}}, {0:{'city':'newyork', 'name':'John', 'age':'30'}}] df = pd

Concatenating data frame rows based on column condition

阅读更多关于 Concatenating data frame rows based on column condition

问题 For subsequent discussion, I will refer to the example data frame below: Now, what I wish to achieve is to group all the packet times that are similar - i.e. all the 7s, 12s, etc. Furthermore, the PacketTime field should contain the difference in min and max ( max(PacketTime) - min(PacketTime) ), and the FrameLen , IPLen and TCPLen fields should be lists of all the values that correspond to the grouped time. For example for the 7s group, FrameLen would contain c(304, 276, 276) . My solution

Difference between consecutive dates in pandas groupby [duplicate]

阅读更多关于 Difference between consecutive dates in pandas groupby [duplicate]

问题 This question already has an answer here : Pandas find duration between dates where a condition is met? (1 answer) Closed 2 years ago . I have a data-frame as follows: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "val": [9,2,4,7,6,3,2], "dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 8), pd.Timestamp(2005, 2, 3)]}) id val dates 0 102 9 2002-01-01 1 102 2

How to test string contains elements in list and assign the target element to another column via Pandas

阅读更多关于 How to test string contains elements in list and assign the target element to another column via Pandas

问题 I have a one column list presenting some company names . Some of those names contain the country names (e.g., "China" in "China A1", 'Finland' in "C1 in Finland"). I want to extract their belonging countries based on the company name and a pre-defined list consisted of country names. The original dataframe df shows like this Company name Country 0 China A1 1 Australia-A2 2 Belgium_C1 3 C1 in Finland 4 D1 of Greece 5 E2 for Pakistan For now, I can only come up with an inefficient method. Here