resampling | 易学教程

Resample Pandas Dataframe with “bin size”/“frequency”

阅读更多关于 Resample Pandas Dataframe with “bin size”/“frequency”

问题 9I have a multi-indexed dataframe which I would like to resample to reduce the frequency of datapoints by a factor of 3 (meaning that every 3 rows become one). This: time value ID measurement ET001 0 0 2 1 0.15 3 2 0.3 4 3 0.45 3 4 0.6 3 5 0.75 2 6 0.9 3 ET002 0 0 2 1 0.16 5 2 0.32 4 3 0.45 3 4 0.6 3 5 0.75 2 I want to turn into this: time value ID measurement ET001 0 0.15 3 1 0.6 2.7 2 0.9 3 ET002 0 0.16 3.7 1 0.6 2.7 I tried to turn my time column into a pandas datetime index like so, and

Python PANDAS: Resampling Multivariate Time Series with a Groupby

阅读更多关于 Python PANDAS: Resampling Multivariate Time Series with a Groupby

问题 I have data in the following general format that I would like to resample to 30 day time series windows: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thing1,25,47 1,2004-01-17,thing2,150,8 2,2004-01-29,thing2,150,25 3,2017-07-15,thing3,55,17 3,2016-05-12,thing3,55,47 4,2012-02-23,thing2,150,22 4,2009-10-10,thing1,25,12 4,2014-04-04,thing2,150,2 5,2008-07-09,thing2,150,43 I would like the 30 day windows to start on 2014-01-01 and end on 12-31-2018. It is NOT guaranteed

Lanczos Resampling error

阅读更多关于 Lanczos Resampling error

问题 I have written an image resizer using Lanczos re-sampling. I've taken the implementation straight from the directions on wikipedia. The results look good visually, but for some reason it does not match the result from Matlab's resize with Lanczos very well (in pixel error). Does anybody see any errors? This is not my area of expertise at all... Here is my filter (I'm using Lanczos3 by default): double lanczos_size_ = 3.0; inline double sinc(double x) { double pi = 3.1415926; x = (x * pi); if

Resample time-series of position evenly in time

阅读更多关于 Resample time-series of position evenly in time

问题 As often happens in Earth sciences, I have a time series of positions (lon,lat). The time series is not evenly spaced in time. The time sampling looks like : t_diff_every_position = [3.99, 1.00, 3.00, 4.00, 3.98, 3.99, ... ] And I have associated position with every t : lat = [77.0591, 77.0547, 77.0537, 74.6766, 74.6693, 74.6725, ... ] lon = [-135.2876, -135.2825, -135.2776, -143.7432, -143.7994, -143.8582, ... ] I want to re-sample the positions to have a dataset evenly spaced in time. So I

bootstrapping/resampling matrix by row in R

阅读更多关于 bootstrapping/resampling matrix by row in R

问题 I have a matrix x with 20 rows and 10 columns. I need to sample (with replacement) 5 rows at a time and calculate column means. I need to repeat this procedure by 15 times and report the column means for each time. As an example, I used resample library in R to perform this. # Create a random matrix library("resample") set.seed(1234) x <- matrix( round(rnorm(200, 5)), ncol=10) ## Bootstrap 15 times by re sampling 5 rows at a time. k <- bootstrap(x,colMeans,B = 15,block.size=5) My concern with

Using resample to align multiple timeseries in pandas

阅读更多关于 Using resample to align multiple timeseries in pandas

问题 Here's the setup code: import pandas from datetime import datetime a_values = [1728, 1635, 1733] a_index = [datetime(2011, 10, 31), datetime(2012, 1, 31), datetime(2012, 4, 30)] a = pandas.Series(data=a_values, index=a_index) aa_values = [6419, 5989, 6006] aa_index = [datetime(2011, 9, 30), datetime(2011, 12, 31), datetime(2012, 3, 31)] aa = pandas.Series(data=aa_values, index=aa_index) apol_values = [1100, 1179, 969] apol_index = [datetime(2011, 8, 31), datetime(2011, 11, 30), datetime(2012,

resample Pandas dataframe and merge strings in column

阅读更多关于 resample Pandas dataframe and merge strings in column

问题 I want to resample a pandas dataframe and apply different functions to different columns. The problem is that I cannot properly process a column with strings. I would like to apply a function that merges the string with a delimiter such as " - ". This is a data example: import pandas as pd import numpy as np idx = pd.date_range('2017-01-31', '2017-02-03') data=list([[1,10,"ok"],[2,20,"merge"],[3,30,"us"]]) dates=pd.DatetimeIndex(['2017-01-31','2017-02-03','2017-02-03']) d=pd.DataFrame(data,

How can I add rows for all dates between two columns?

阅读更多关于 How can I add rows for all dates between two columns?

问题 import pandas as pd mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'}, {'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}] mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'}, {'ID': '10', 'Entry Date': '10/10/2016',

Resample in a rolling window using pandas

阅读更多关于 Resample in a rolling window using pandas

问题 Assume I have daily data ( not regularly spaced ), I want to compute for each month the moving standard deviation (or an arbitrarily non linear function) in the past 5 months. For example, for May 2012 I would compute the stddev from the period starting from Jan 2012 to May 2012 (5 months). For June 2012 the period starts in Feb 2012, etc. The final result is a time series with monthly values. I cannot apply a rolling window because this would first be daily and secondly I need to specify the

Strange behavior of pandas resampling

阅读更多关于 Strange behavior of pandas resampling

问题 I'm experiencing a rather strange behavior of the resampling function of a pandas time-series (Python). I use the latest version of pandas (0.12.0) Take the following time series: dates = [datetime(2011, 1, 2, 1), datetime(2011, 1, 2, 2), datetime(2011, 1, 2, 3), datetime(2011, 1, 2, 4), datetime(2011, 1, 2, 5), datetime(2011, 1, 2, 6)] ts = Series(np.arange(6.), index=dates) Then try resampling to 66s and to 65s. This is the result I get: In [45]: ts.resample('66min') Out[45]: 2011-01-02 01