binning

Bin formation in a R data.frame

。_饼干妹妹 提交于 2020-01-14 05:23:06
问题 I have a data.frame with two columns: category quantity a 20 b 30 c 100 d 10 e 1 f 23 g 3 h 200 I need to write a function with two parameters: dataframe , bin_size which runs a cumsum over the quantity column, does a split of the subsequent row if the the cumsum exceeds the bin_size and adds a running bin number as an additional column. Say, by entering this: function(dataframe, 50) in the above example should give me: category quantity cumsum bin_nbr a 20 20 1 b 30 50 1 c 50 50 2 c 50 50 3

Is there another way to split a list into bins of equal size and put the remainder if any into the first bin?

懵懂的女人 提交于 2020-01-06 06:57:43
问题 Given a sorted list: x = list(range(20)) I could split the list into equal sizes and put the remainder into the left bins as such: def split_qustions_into_levels(questions, num_bins=3): num_questions = len(questions) equal_size = int(num_questions / num_bins) slices = [equal_size] * num_bins slices[0] += len(questions) % num_bins return [[questions.pop(0) for _ in questions[:s]] for s in slices] If I have 3 bins from the list of 20 items, I should get a output list of list with sizes (7,7,6)

Binning and then combining bins with minimum number of observations?

廉价感情. 提交于 2020-01-03 04:15:10
问题 Let's say I create some data and then create bins of different sizes: from __future__ import division x = np.random.rand(1,20) new, = np.digitize(x,np.arange(1,x.shape[1]+1)/100) new_series = pd.Series(new) print(new_series.value_counts()) reveals: 20 17 16 1 4 1 2 1 dtype: int64 I basically want to transform the underlying data, if I set a minimum threshold of at least 2 per bin, so that new_series.value_counts() is this: 20 17 16 3 dtype: int64 回答1: EDITED: x = np.random.rand(1,100) bins =

Bin pandas dataframe by every X rows

烂漫一生 提交于 2019-12-28 05:59:07
问题 I have a simple dataframe which I would like to bin for every 3 rows. It looks like this: col1 0 2 1 1 2 3 3 1 4 0 and I would like to turn it into this: col1 0 2 1 0.5 I have already posted a similar question here but I have no Idea how to port the solution to my current use case. Can you help me out? Many thanks! 回答1: >>> df.groupby(df.index / 3).mean() col1 0 2.0 1 0.5 回答2: The answer from Roman Pekar was not working for me. I imagine that this is because of differences between Python2 and

Bin pandas dataframe by every X rows

我们两清 提交于 2019-12-28 05:59:03
问题 I have a simple dataframe which I would like to bin for every 3 rows. It looks like this: col1 0 2 1 1 2 3 3 1 4 0 and I would like to turn it into this: col1 0 2 1 0.5 I have already posted a similar question here but I have no Idea how to port the solution to my current use case. Can you help me out? Many thanks! 回答1: >>> df.groupby(df.index / 3).mean() col1 0 2.0 1 0.5 回答2: The answer from Roman Pekar was not working for me. I imagine that this is because of differences between Python2 and

Hexbin: how to trace bin contents

房东的猫 提交于 2019-12-25 04:26:23
问题 After applying hexbin'ning I would like to know which id or rownumbers of the original data ended up in which bin. I am currently analysing spatial data and I am binning, e.g., depth of water and temperature. Ideally, I would like to map the colormap of the bins back to the spatial map to see where more or less common parameter combinations exist. I'm not bound to hexbin though. I wasn't able to figure out from the documentation, how to trace which datapoint ends up in which bin. It seems

R - overcome “cut” ignoring values outside of range in data table

独自空忆成欢 提交于 2019-12-24 18:45:53
问题 I am comparing two years worth of daily soil moisture (SM) measurements. In one year, SM ranged from 0 to 0.6. In the other year, which had more rain, SM ranged from 0 to 0.8. Amongst the data, I also have some NA's , where the SM sensor did not work for some reason. Let's re-create something similar: library(data.table) set.seed(24) dt1 <- data.table(date=seq(as.Date("2015-01-01"), length.out=365, by="1 day"), sm=sample(c(NA, runif(10, min=0, max=0.6)), 365, replace = TRUE)) dt2 <- data

R binning dataset and surface plot

与世无争的帅哥 提交于 2019-12-24 13:52:33
问题 I have a large data set that I am trying to discretise and create a 3d surface plot with: rowColFoVCell wpbCount Feret 1 001001001001 1 0.58 2 001001001001 1 1.30 3 001001001001 1 0.58 4 001001001001 1 0.23 5 001001001001 2 0.23 6 001001001001 2 0.58 There are currently 695302 rows in this data set. I am trying to discretise the third 'Feret' column based on the second column, so for each 'wpbCount' bin the 'Feret' column. I think the solution will involve using cut but I am not sure how to

Matlab 2-D density plot

妖精的绣舞 提交于 2019-12-23 19:23:16
问题 I am trying to do a density plot for a data containing two columns with different ranges. The RMSD column is [0-2] and Angle is [0-200] ranges. My data in the file is like this: 0.0225370 37.088 0.1049553 35.309 0.0710002 33.993 0.0866880 34.708 0.0912664 33.011 0.0932054 33.191 0.1083590 37.276 0.1104145 34.882 0.1027977 34.341 0.0896688 35.991 0.1047578 36.457 0.1215936 38.914 0.1105484 35.051 0.0974138 35.533 0.1390955 33.601 0.1333878 32.133 0.0933365 35.714 0.1200465 33.038 0.1155794 33

Timeseries average based on a defined time interval (bin)

佐手、 提交于 2019-12-23 04:29:06
问题 Here is an example of my dataset. I want to calculate bin average based on time (i.e., ts) every 10 seconds. Could you please provide some hints so that I can carry on? In my case, I want to average time (ts) and Var in every 10 seconds. For example, I will get an averaged value of Var and ts from 0 to 10 seconds; I will get another averaged value of Var and ts from 11 to 20 seconds, etc. df = data.frame(ts = seq(1,100,by=0.5), Var = runif(199,1, 10)) Any functions or libraries in R can I use