resampling

Resampling Error : cannot reindex a non-unique index with a method or limit

早过忘川 提交于 2019-11-30 04:14:06
问题 I am using Pandas to structure and process Data. I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same time, timedates which are relative to every Id, and finally keep the bitrate score. For example, given : df = pd.DataFrame( {'Id' : ['CODI126640013.ts', 'CODI126622312.ts'], 'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'], 'bitrate': ['3750000', '3750000'

How to specify a validation holdout set to caret

旧时模样 提交于 2019-11-29 22:09:32
问题 I really like using caret for at least the early stages of modeling, especially for it's really easy to use resampling methods. However, I'm working on a model where the training set has a fair number of cases added via semi-supervised self-training and my cross-validation results are really skewed because of it. My solution to this is using a validation set to measure model performance but I can't see a way use a validation set directly within caret - am I missing something or this just not

Scipy interpolation how to resize/resample 3x3 matrix to 5x5?

回眸只為那壹抹淺笑 提交于 2019-11-29 13:09:26
问题 EDIT: Paul has solved this one below. Thanks! I'm trying to resample (upscale) a 3x3 matrix to 5x5, filling in the intermediate points with either interpolate.interp2d or interpolate.RectBivariateSpline (or whatever works). If there's a simple, existing function to do this, I'd like to use it, but I haven't found it yet. For example, a function that would work like: # upscale 2x2 to 4x4 matrixSmall = ([[-1,8],[3,5]]) matrixBig = matrixSmall.resample(4,4,cubic) So, if I start with a 3x3 matrix

Pandas' equivalent of resample for integer index

杀马特。学长 韩版系。学妹 提交于 2019-11-29 02:10:04
I'm looking for a pandas equivalent of the resample method for a dataframe whose isn't a DatetimeIndex but an array of integers, or maybe even floats. I know that for some cases ( this one , for example) the resample method can be substituted easily by a reindex and interpolation, but for some cases (I think) it can't. For example, if I have df = pd.DataFrame(np.random.randn(10,2)) withdates = df.set_index(pd.date_range('2012-01-01', periods=10)) withdates.resample('5D', np.std) this gives me 0 1 2012-01-01 1.184582 0.492113 2012-01-06 0.533134 0.982562 but I can't produce the same result with

Downsample a 1D numpy array

为君一笑 提交于 2019-11-28 23:07:22
I have a 1-d numpy array which I would like to downsample. Any of the following methods are acceptable if the downsampling raster doesn't perfectly fit the data: overlap downsample intervals convert whatever number of values remains at the end to a separate downsampled value interpolate to fit raster basically if I have 1 2 6 2 1 and I am downsampling by a factor of 3, all of the following are ok: 3 3 3 1.5 or whatever an interpolation would give me here. I'm just looking for the fastest/easiest way to do this. I found scipy.signal.decimate , but that sounds like it decimates the values (takes

Pandas every nth row

▼魔方 西西 提交于 2019-11-28 03:22:23
Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method? chrisb I'd use iloc , which takes a row/column slice, both based on integer position and following normal python syntax. df.iloc[::5, :] Though @chrisb's accepted answer does answer the question, I would like to add to it the following. A simple method I use to get the nth data or drop the nth row is the following: df1 = df[df.index % 3 != 0] # Excludes every 3rd row starting from 0 df2 = df[df.index % 3 == 0] # Selects every 3rd raw starting

Percentiles of Live Data Capture

不想你离开。 提交于 2019-11-27 17:15:17
I am looking for an algorithm that determines percentiles for live data capture. For example, consider the development of a server application. The server might have response times as follows: 17 ms 33 ms 52 ms 60 ms 55 ms etc. It is useful to report the 90th percentile response time, 80th percentile response time, etc. The naive algorithm is to insert each response time into a list. When statistics are requested, sort the list and get the values at the proper positions. Memory usages scales linearly with the number of requests. Is there an algorithm that yields "approximate" percentile

Downsample a 1D numpy array

╄→尐↘猪︶ㄣ 提交于 2019-11-27 14:32:28
问题 I have a 1-d numpy array which I would like to downsample. Any of the following methods are acceptable if the downsampling raster doesn't perfectly fit the data: overlap downsample intervals convert whatever number of values remains at the end to a separate downsampled value interpolate to fit raster basically if I have 1 2 6 2 1 and I am downsampling by a factor of 3, all of the following are ok: 3 3 3 1.5 or whatever an interpolation would give me here. I'm just looking for the fastest

How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?

拥有回忆 提交于 2019-11-27 11:09:18
I'm writing some code that plays back WAV files at different speeds, so that the wave is either slower and lower-pitched, or faster and higher-pitched. I'm currently using simple linear interpolation, like so: int newlength = (int)Math.Round(rawdata.Length * lengthMultiplier); float[] output = new float[newlength]; for (int i = 0; i < newlength; i++) { float realPos = i / lengthMultiplier; int iLow = (int)realPos; int iHigh = iLow + 1; float remainder = realPos - (float)iLow; float lowval = 0; float highval = 0; if ((iLow >= 0) && (iLow < rawdata.Length)) { lowval = rawdata[iLow]; } if ((iHigh

Round pandas datetime index?

早过忘川 提交于 2019-11-27 07:04:48
问题 I am reading multiple spreadsheets of timeseries into a pandas dataFrame and concatenating them together with a common pandas datetime index. The datalogger that logged the timeseries is not 100% accurate which makes resampling very annoying because depending on if the time is slightly higher or lower than the interval being sampled it will create NaNs and starts to make my series look like a broken line. Here's my code def loaddata(filepaths): t1 = time.clock() for i in range(len(filepaths))