pandas | 易学教程

Difference between consecutive dates in pandas groupby [duplicate]

阅读更多关于 Difference between consecutive dates in pandas groupby [duplicate]

问题 This question already has an answer here : Pandas find duration between dates where a condition is met? (1 answer) Closed 2 years ago . I have a data-frame as follows: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "val": [9,2,4,7,6,3,2], "dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 8), pd.Timestamp(2005, 2, 3)]}) id val dates 0 102 9 2002-01-01 1 102 2

How to test string contains elements in list and assign the target element to another column via Pandas

阅读更多关于 How to test string contains elements in list and assign the target element to another column via Pandas

问题 I have a one column list presenting some company names . Some of those names contain the country names (e.g., "China" in "China A1", 'Finland' in "C1 in Finland"). I want to extract their belonging countries based on the company name and a pre-defined list consisted of country names. The original dataframe df shows like this Company name Country 0 China A1 1 Australia-A2 2 Belgium_C1 3 C1 in Finland 4 D1 of Greece 5 E2 for Pakistan For now, I can only come up with an inefficient method. Here

X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

阅读更多关于 X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

问题 I am trying to set the x axis tick labels as the year but have the gridlines as the fiscal quarter. The data is quite simple, just a groupby date.count, see below. Each date has a count and I am plotting it as a line plot. rc[(rc['form']=='Bakken')&(rc['tgt']=='oil')].groupby(['date']).date.count() date count 2010-01-08 65 2010-01-15 68 2010-01-22 73 2010-01-29 76 2010-02-05 79 2010-02-12 76 2010-02-19 79 2010-02-26 83 2010-03-05 81 2010-03-12 83 2010-03-19 80 2010-03-26 87 2010-04-02 84 2010

X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

阅读更多关于 X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

Find duplicate rows among different groups with pandas

阅读更多关于 Find duplicate rows among different groups with pandas

问题 Problem Consider the following dataframe: data_so = { 'ID': [100, 100, 100, 200, 200, 300, 300, 300], 'letter': ['A','B','A','C','D','E','D','A'], } df_so = pandas.DataFrame (data_so, columns = ['ID', 'letter']) I want to obtain a new column where all duplicates in different groups are True. All other duplicates in the same group should be False. What I've tried I've tried using df_so['dup'] = df_so.duplicated(subset=['letter'], keep=False) but the result is not what I want: The first

Python regex to pick all elements that don't match pattern

阅读更多关于 Python regex to pick all elements that don't match pattern

问题 I asked a similar question yesterday Keep elements with pattern in pandas series without converting them to list and now I am faced with the opposite problem. I have a pandas dataframe: import pandas as pd df = pd.DataFrame(["Air type:1, Space kind:2, water, wood", "berries, something at the start:4, Space blu:3, somethingelse"], columns = ['A']) and I want to pick all elements that don't have a ":" in them. What I tried is the following regex which seems to be working: df['new'] = df.A.str

Converting pandas dataframe to pandas series

阅读更多关于 Converting pandas dataframe to pandas series

问题 I need some help with a data types issue. I'm trying to convert a pandas dataframe, which looks like the following: timestamp number 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 into a pandas series, which looks exactly like the dataframe, without the column names timestamp and number: 2018-01-01 1 2018-02-01 0 2018-03-01 5 2018-04-01 0 2018-05-01 6 It shouldn't be difficult, but I'm having a little trouble figuring out the way to do it, as I'm a beginner in pandas. It

Converting pandas dataframe to pandas series

阅读更多关于 Converting pandas dataframe to pandas series

Printing Pandas Columns With Unicode Characters

阅读更多关于 Printing Pandas Columns With Unicode Characters

问题 I have a pandas dataframe with a single column that contains a unicode encoded name. import pandas as pd no_unicode = pd.Series(['Steve', 'Jason', 'Jake']) yes_unicode = pd.Series(['tea', 'caf\xe9', 'beer']) var_names = dict(no_unicode = no_unicode, yes_unicode = yes_unicode) df = pd.DataFrame(var_names) print(df) I can print the dataframe in ipython fine, but I get an error when I try to print the dataframe in Sublimetext (using py3). UnicodeEncodeError: 'ascii' codec can't encode character

How do I use my first row in my spreadsheet for my Dataframe column names instead of 0 1 2…etc?

阅读更多关于 How do I use my first row in my spreadsheet for my Dataframe column names instead of 0 1 2…etc?

问题 I want my dataframe to display the first row names as my dataframe column name instead of numbering from 0 etc. How do I do this? I tried using pandas and openpyxl modules to turn my Excel spreadsheet into a dataframe. import pandas as pd from openpyxl import load_workbook from openpyxl.utils.dataframe import dataframe_to_rows wb = load_workbook(filename='Budget1.xlsx') print(wb.sheetnames) sheet_ranges=wb['May 2019'] print(sheet_ranges['A3'].value) ws=wb['May 2019'] df=pd.DataFrame(ws.values