pandas

Difference between consecutive dates in pandas groupby [duplicate]

时光总嘲笑我的痴心妄想 提交于 2021-02-10 13:22:09
问题 This question already has an answer here : Pandas find duration between dates where a condition is met? (1 answer) Closed 2 years ago . I have a data-frame as follows: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "val": [9,2,4,7,6,3,2], "dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 8), pd.Timestamp(2005, 2, 3)]}) id val dates 0 102 9 2002-01-01 1 102 2

How to test string contains elements in list and assign the target element to another column via Pandas

天涯浪子 提交于 2021-02-10 13:16:11
问题 I have a one column list presenting some company names . Some of those names contain the country names (e.g., "China" in "China A1", 'Finland' in "C1 in Finland"). I want to extract their belonging countries based on the company name and a pre-defined list consisted of country names. The original dataframe df shows like this Company name Country 0 China A1 1 Australia-A2 2 Belgium_C1 3 C1 in Finland 4 D1 of Greece 5 E2 for Pakistan For now, I can only come up with an inefficient method. Here

X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

丶灬走出姿态 提交于 2021-02-10 12:57:43
问题 I am trying to set the x axis tick labels as the year but have the gridlines as the fiscal quarter. The data is quite simple, just a groupby date.count, see below. Each date has a count and I am plotting it as a line plot. rc[(rc['form']=='Bakken')&(rc['tgt']=='oil')].groupby(['date']).date.count() date count 2010-01-08 65 2010-01-15 68 2010-01-22 73 2010-01-29 76 2010-02-05 79 2010-02-12 76 2010-02-19 79 2010-02-26 83 2010-03-05 81 2010-03-12 83 2010-03-19 80 2010-03-26 87 2010-04-02 84 2010

X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

只谈情不闲聊 提交于 2021-02-10 12:57:20
问题 I am trying to set the x axis tick labels as the year but have the gridlines as the fiscal quarter. The data is quite simple, just a groupby date.count, see below. Each date has a count and I am plotting it as a line plot. rc[(rc['form']=='Bakken')&(rc['tgt']=='oil')].groupby(['date']).date.count() date count 2010-01-08 65 2010-01-15 68 2010-01-22 73 2010-01-29 76 2010-02-05 79 2010-02-12 76 2010-02-19 79 2010-02-26 83 2010-03-05 81 2010-03-12 83 2010-03-19 80 2010-03-26 87 2010-04-02 84 2010

Find duplicate rows among different groups with pandas

南笙酒味 提交于 2021-02-10 12:55:45
问题 Problem Consider the following dataframe: data_so = { 'ID': [100, 100, 100, 200, 200, 300, 300, 300], 'letter': ['A','B','A','C','D','E','D','A'], } df_so = pandas.DataFrame (data_so, columns = ['ID', 'letter']) I want to obtain a new column where all duplicates in different groups are True. All other duplicates in the same group should be False. What I've tried I've tried using df_so['dup'] = df_so.duplicated(subset=['letter'], keep=False) but the result is not what I want: The first

Python regex to pick all elements that don't match pattern

我的梦境 提交于 2021-02-10 12:49:10
问题 I asked a similar question yesterday Keep elements with pattern in pandas series without converting them to list and now I am faced with the opposite problem. I have a pandas dataframe: import pandas as pd df = pd.DataFrame(["Air type:1, Space kind:2, water, wood", "berries, something at the start:4, Space blu:3, somethingelse"], columns = ['A']) and I want to pick all elements that don't have a ":" in them. What I tried is the following regex which seems to be working: df['new'] = df.A.str

Converting pandas dataframe to pandas series

天涯浪子 提交于 2021-02-10 12:37:24
问题 I need some help with a data types issue. I'm trying to convert a pandas dataframe, which looks like the following: timestamp number 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 into a pandas series, which looks exactly like the dataframe, without the column names timestamp and number: 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 It shouldn't be difficult, but I'm having a little trouble figuring out the way to do it, as I'm a beginner in pandas. It

Converting pandas dataframe to pandas series

久未见 提交于 2021-02-10 12:36:29
问题 I need some help with a data types issue. I'm trying to convert a pandas dataframe, which looks like the following: timestamp number 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 into a pandas series, which looks exactly like the dataframe, without the column names timestamp and number: 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 It shouldn't be difficult, but I'm having a little trouble figuring out the way to do it, as I'm a beginner in pandas. It

Printing Pandas Columns With Unicode Characters

大兔子大兔子 提交于 2021-02-10 12:29:48
问题 I have a pandas dataframe with a single column that contains a unicode encoded name. import pandas as pd no_unicode = pd.Series(['Steve', 'Jason', 'Jake']) yes_unicode = pd.Series(['tea', 'caf\xe9', 'beer']) var_names = dict(no_unicode = no_unicode, yes_unicode = yes_unicode) df = pd.DataFrame(var_names) print(df) I can print the dataframe in ipython fine, but I get an error when I try to print the dataframe in Sublimetext (using py3). UnicodeEncodeError: 'ascii' codec can't encode character

How do I use my first row in my spreadsheet for my Dataframe column names instead of 0 1 2…etc?

二次信任 提交于 2021-02-10 12:12:41
问题 I want my dataframe to display the first row names as my dataframe column name instead of numbering from 0 etc. How do I do this? I tried using pandas and openpyxl modules to turn my Excel spreadsheet into a dataframe. import pandas as pd from openpyxl import load_workbook from openpyxl.utils.dataframe import dataframe_to_rows wb = load_workbook(filename='Budget1.xlsx') print(wb.sheetnames) sheet_ranges=wb['May 2019'] print(sheet_ranges['A3'].value) ws=wb['May 2019'] df=pd.DataFrame(ws.values