resampling

Resample Pandas Dataframe with “bin size”/“frequency”

北战南征 提交于 2019-12-11 07:59:56
问题 9I have a multi-indexed dataframe which I would like to resample to reduce the frequency of datapoints by a factor of 3 (meaning that every 3 rows become one). This: time value ID measurement ET001 0 0 2 1 0.15 3 2 0.3 4 3 0.45 3 4 0.6 3 5 0.75 2 6 0.9 3 ET002 0 0 2 1 0.16 5 2 0.32 4 3 0.45 3 4 0.6 3 5 0.75 2 I want to turn into this: time value ID measurement ET001 0 0.15 3 1 0.6 2.7 2 0.9 3 ET002 0 0.16 3.7 1 0.6 2.7 I tried to turn my time column into a pandas datetime index like so, and

Python PANDAS: Resampling Multivariate Time Series with a Groupby

冷暖自知 提交于 2019-12-11 07:06:37
问题 I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thing1,25,47 1,2004-01-17,thing2,150,8 2,2004-01-29,thing2,150,25 3,2017-07-15,thing3,55,17 3,2016-05-12,thing3,55,47 4,2012-02-23,thing2,150,22 4,2009-10-10,thing1,25,12 4,2014-04-04,thing2,150,2 5,2008-07-09,thing2,150,43 I would like the 30 day windows to start on 2014-01-01 and end on 12-31-2018. It is NOT guaranteed

Lanczos Resampling error

空扰寡人 提交于 2019-12-11 04:19:03
问题 I have written an image resizer using Lanczos re-sampling. I've taken the implementation straight from the directions on wikipedia. The results look good visually, but for some reason it does not match the result from Matlab's resize with Lanczos very well (in pixel error). Does anybody see any errors? This is not my area of expertise at all... Here is my filter (I'm using Lanczos3 by default): double lanczos_size_ = 3.0; inline double sinc(double x) { double pi = 3.1415926; x = (x * pi); if

Resample time-series of position evenly in time

你。 提交于 2019-12-11 03:56:41
问题 As often happens in Earth sciences, I have a time series of positions (lon,lat). The time series is not evenly spaced in time. The time sampling looks like : t_diff_every_position = [3.99, 1.00, 3.00, 4.00, 3.98, 3.99, ... ] And I have associated position with every t : lat = [77.0591, 77.0547, 77.0537, 74.6766, 74.6693, 74.6725, ... ] lon = [-135.2876, -135.2825, -135.2776, -143.7432, -143.7994, -143.8582, ... ] I want to re-sample the positions to have a dataset evenly spaced in time. So I

bootstrapping/resampling matrix by row in R

走远了吗. 提交于 2019-12-11 03:12:15
问题 I have a matrix x with 20 rows and 10 columns. I need to sample (with replacement) 5 rows at a time and calculate column means. I need to repeat this procedure by 15 times and report the column means for each time. As an example, I used resample library in R to perform this. # Create a random matrix library("resample") set.seed(1234) x <- matrix( round(rnorm(200, 5)), ncol=10) ## Bootstrap 15 times by re sampling 5 rows at a time. k <- bootstrap(x,colMeans,B = 15,block.size=5) My concern with

Using resample to align multiple timeseries in pandas

余生长醉 提交于 2019-12-10 17:16:50
问题 Here's the setup code: import pandas from datetime import datetime a_values = [1728, 1635, 1733] a_index = [datetime(2011, 10, 31), datetime(2012, 1, 31), datetime(2012, 4, 30)] a = pandas.Series(data=a_values, index=a_index) aa_values = [6419, 5989, 6006] aa_index = [datetime(2011, 9, 30), datetime(2011, 12, 31), datetime(2012, 3, 31)] aa = pandas.Series(data=aa_values, index=aa_index) apol_values = [1100, 1179, 969] apol_index = [datetime(2011, 8, 31), datetime(2011, 11, 30), datetime(2012,

resample Pandas dataframe and merge strings in column

巧了我就是萌 提交于 2019-12-10 11:12:10
问题 I want to resample a pandas dataframe and apply different functions to different columns. The problem is that I cannot properly process a column with strings. I would like to apply a function that merges the string with a delimiter such as " - ". This is a data example: import pandas as pd import numpy as np idx = pd.date_range('2017-01-31', '2017-02-03') data=list([[1,10,"ok"],[2,20,"merge"],[3,30,"us"]]) dates=pd.DatetimeIndex(['2017-01-31','2017-02-03','2017-02-03']) d=pd.DataFrame(data,

How can I add rows for all dates between two columns?

耗尽温柔 提交于 2019-12-10 03:59:39
问题 import pandas as pd mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'}, {'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}] mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016',

Resample in a rolling window using pandas

情到浓时终转凉″ 提交于 2019-12-10 03:45:39
问题 Assume I have daily data ( not regularly spaced ), I want to compute for each month the moving standard deviation (or an arbitrarily non linear function) in the past 5 months. For example, for May 2012 I would compute the stddev from the period starting from Jan 2012 to May 2012 (5 months). For June 2012 the period starts in Feb 2012, etc. The final result is a time series with monthly values. I cannot apply a rolling window because this would first be daily and secondly I need to specify the

Strange behavior of pandas resampling

最后都变了- 提交于 2019-12-08 14:42:33
问题 I'm experiencing a rather strange behavior of the resampling function of a pandas time-series (Python). I use the latest version of pandas (0.12.0) Take the following time series: dates = [datetime(2011, 1, 2, 1), datetime(2011, 1, 2, 2), datetime(2011, 1, 2, 3), datetime(2011, 1, 2, 4), datetime(2011, 1, 2, 5), datetime(2011, 1, 2, 6)] ts = Series(np.arange(6.), index=dates) Then try resampling to 66s and to 65s. This is the result I get: In [45]: ts.resample('66min') Out[45]: 2011-01-02 01