pandas | 易学教程

Python: Calculate average for each hour in CSV?

阅读更多关于 Python: Calculate average for each hour in CSV?

问题 I want to calculate the average for each hours using a CSV file: Below is my DATA SET: Timestamp Temperature 9/1/2016 0:00:08 53.8 9/1/2016 0:00:38 53.8 9/1/2016 0:01:08 53.8 9/1/2016 0:01:38 53.8 9/1/2016 0:02:08 53.8 9/1/2016 0:02:38 54.1 9/1/2016 0:03:08 54.1 9/1/2016 0:03:38 54.1 9/1/2016 0:04:38 54 9/1/2016 0:05:38 54 9/1/2016 0:06:08 54 9/1/2016 0:06:38 54 9/1/2016 0:07:08 54 9/1/2016 0:07:38 54 9/1/2016 0:08:08 54.1 9/1/2016 0:08:38 54.1 9/1/2016 0:09:38 54.1 9/1/2016 0:10:32 54 9/1

pandas dataframe.apply — converting hex string to int number

阅读更多关于 pandas dataframe.apply — converting hex string to int number

问题 I am very new to both python and pandas. I would like to know how to convert dataframe elements from hex string input to integer number, also I have followed the solution provided by: convert pandas dataframe column from hex string to int However, it is still not working. The following is my code: df = pd.read_csv(filename, delim_whitespace = True, header = None, usecols = range(7,23,2)) for i in range(num_frame): skipheader = lineNum[header_padding + i*2] data = df.iloc[skipheader:skipheader

Pandas Multiindex Groupby aggregate column with value from another column

阅读更多关于 Pandas Multiindex Groupby aggregate column with value from another column

问题 I have a pandas dataframe with multiindex where I want to aggregate the duplicate key rows as follows: import numpy as np import pandas as pd df = pd.DataFrame({'S':[0,5,0,5,0,3,5,0],'Q':[6,4,10,6,2,5,17,4],'A': ['A1','A1','A1','A1','A2','A2','A2','A2'], 'B':['B1','B1','B2','B2','B1','B1','B1','B2']}) df.set_index(['A','B']) Q S A B A1 B1 6 0 B1 4 5 B2 10 0 B2 6 5 A2 B1 2 0 B1 5 3 B1 17 5 B2 4 0 and I would like to groupby this dataframe to aggregate the Q values (sum) and keep the S value

How to convert monthly return to yearly return after checking if all 12 months are present in pandas?

阅读更多关于 How to convert monthly return to yearly return after checking if all 12 months are present in pandas?

问题 I have monthly returns and i want to convert them into yearly returns for each company (using cusip6, i am using CRSP data). I also want to keep only those years which have all the 12 months. I am currently using the following code, but i would like to know if there is inbuilt functions in pandas that can do this?` def monthly_to_ann_ret(data): """ funtion to check if all 12 months are present and calculate yearly returns from monthly returns """ data['year'] = data['date'].dt.year data.sort(

Trying to insert pandas dataframe to temporary table

阅读更多关于 Trying to insert pandas dataframe to temporary table

问题 I'm looking to create a temp table and insert a some data into it. I have used pyodbc extensively to pull data but I am not familiar with writing data to SQL from a python environment. I am doing this at work so I dont have the ability to create tables, but I can create temp and global temp tables. My intent is to insert a relatively small dataframe (150rows x 4cols)into a temp table and reference it throughout my session, my program structure makes it so that a global variable in the session

Changing monthly values to daily by evenly distributing between dates

阅读更多关于 Changing monthly values to daily by evenly distributing between dates

问题 I have monthly dataset df = pd.DataFrame({'Month':[1,2], 'Plan':[310,620], 'Month_start_date': ['2020-01-01','2020-02-01']}) print(df) df['Month_start_date'] = (pd.to_datetime(df['Month_start_date'], format='%Y/%m/%d') .dt.to_period('m').dt.to_timestamp()) df = df.set_index('Month_start_date') I created a list of dates in a format i would like to reindex start = '2020-01-01' end = '2020-02-29' dates = pd.date_range(start, end, freq='D') dates when i try to change the dataframe to daily using

Find the business days between two columns in a pandas dataframe, which contain NaTs

阅读更多关于 Find the business days between two columns in a pandas dataframe, which contain NaTs

问题 I have 2 columns in my pandas data frame, and I want to calculate the business dates between them. Data: ID On hold Off Hold 101 09/15/2017 09/16/2017 102 NA NA 103 09/22/2017 09/26/2017 104 10/12/2017 10/30/2017 105 NA NA 106 08/05/2017 08/06/2017 107 08/08/2017 08/03/2017 108 NA NA I tried the below code using busday_count from numpy: df1['On hold'] = pd.to_datetime(df1['On hold']) df1['Off Hold'] = pd.to_datetime(df1['Off Hold']) np.busday_count(df1['On hold'].values.astype('datetime64[D]'

Sum a range of cells in a single column in pandas dataframe

阅读更多关于 Sum a range of cells in a single column in pandas dataframe

问题 I have three columns in a DataFrame. I want to take the number in the Streak_Count column and sum up that number of cells from the returns in the MON TOTAL. The result is displayed in the WANTED RESULT as shown below. The issue I cant figure out is summing the number of cells which can be any number>> in this example between 1 and 4. MON TOTAL STREAK_COUNT WANTED RESULT 1/2/1992 1.123077 1 1.123077 (only 1 so 1.12) 2/3/1992 -1.296718 0 3/2/1992 -6.355612 2 -7.65233 (sum of -1.29 and -6.35) 4

Splitting Columns' Values in Pandas by delimiter without losing delimiter

阅读更多关于 Splitting Columns' Values in Pandas by delimiter without losing delimiter

问题 Hi I have a dataframe that follows this format: df = pd.DataFrame(np.array([[1, 2, 'Apples 20pk ABC123', 4, 5], [6, 7, 'Oranges 40pk XYZ123', 9, 0], [5, 6, 'Bananas 20pk ABC123', 8, 9]]), columns= ['Serial #', 'Branch ID', 'Info', 'Value1', 'Value2']) Serial# Branch ID Info Value1 Value2 0 1 2 Apples 20pk ABC123 4 5 1 6 7 Bananas 20pk ABC123 9 0 2 5 6 Oranges 40pk XYZ123 8 9 I want to split the "Info" column's values based on the "pk" character. Essentially, I want to create two new columns,

Pandas DataFrame currency conversion

阅读更多关于 Pandas DataFrame currency conversion

问题 I have DataFrame with two columns: col1 | col2 20 EUR 31 GBP 5 JPY I may have 10000 rows like this How to do fast currency conversion to base currency being GBP? should I use easymoney? I know how to apply conversion to single row but I do not know how to iterate through all the rows fast. EDIT: I would like to apply sth as: def convert_currency(amount, currency_symbol): converted = ep.currency_converter(amount=1000, from_currency=currency_symbol, to_currency="GBP") return converted df.loc[df