bins

Customizing bin widths in plotly's histogram function in R

╄→гoц情女王★ 提交于 2021-02-11 15:30:08
问题 I have a dataset that dates and call volume per day. When I plotted them using the plotly R package, all except for 1 of them had each date separated into a different bin. However, this one tricky subset of the data instead grouped bins into 2 day intervals, which isn't very useful information. I'm sure it's an easy fix, but I'm not quite sure how to change the bin width. a <- as.Date(c("2019-02-01", "2019-01-14", "2019-01-15", "2019-01-24", "2019-01-31", "2019-01-22","2019-01-14", "2019-01

With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?

旧巷老猫 提交于 2021-02-07 05:35:06
问题 My dataframe has zero as the lowest value. I am trying to use the precision and include_lowest parameters of pandas.cut() , but I can't get the intervals consist of integers rather than floats with one decimal. I can also not get the left most interval to stop at zero. import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style='white', font_scale=1.3) df = pd.DataFrame(range(0,389,8)[:-1], columns=['value']) df['binned_df_pd'] = pd.cut(df.value, bins=7, precision

Pandas: Bin dates into 30 minute intervals and calculate averages

假如想象 提交于 2020-03-21 19:38:06
问题 I have a Pandas dataframe with two columns which are speed and time . speed date 54.72 1:33:56 49.37 1:33:59 37.03 1:34:03 24.02 7:39:58 28.02 7:40:01 24.04 7:40:04 24.02 7:40:07 25.35 7:40:10 26.69 7:40:13 32.04 7:40:16 28.02 11:05:43 30.71 11:05:46 29.36 11:05:49 18.68 11:05:52 54.72 11:05:55 34.69 10:31:34 25.03 10:31:38 56.04 10:31:40 44.03 10:31:43 I want to calculate the average of speeds per bins of 30 minutes. For example, the average speed during the 4th bin (1:31 - 2:00) is (54.72 +

MATLAB bins setting in histogram

若如初见. 提交于 2020-02-07 12:30:09
问题 I want to change how data are distributed in the histogram that I built.My histogram looks like But I want it to look like I set bins1=[10,30,50,70]; hist(data,bins1) How can I arrange the bins as in the second figure? 回答1: Use histc instead of hist . histc allows you to define the edges while hist uses the second input parameter as centers. 来源: https://stackoverflow.com/questions/29016872/matlab-bins-setting-in-histogram

Python Pandas Create New Bin/Bucket Variable with pd.qcut

僤鯓⒐⒋嵵緔 提交于 2019-12-31 13:45:11
问题 How do you create a new Bin/Bucket Variable using pd.qut in python? This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...). 回答1: In Pandas 0.15.0 or newer, pd.qcut will return a Series, not a

Python: Assigning # values in a list to bins, by rounding up

混江龙づ霸主 提交于 2019-12-23 12:51:38
问题 I want a function that can take a series and a set of bins, and basically round up to the nearest bin. For example: my_series = [ 1, 1.5, 2, 2.3, 2.6, 3] def my_function(my_series, bins): ... my_function(my_series, bins=[1,2,3]) > [1,2,2,3,3,3] This seems to be very close to what Numpy's Digitize is intended to do, but it produces the wrong values (asterisks for wrong values): np.digitize(my_series, bins= [1,2,3], right=False) > [1, 1*, 2, 2*, 2*, 3] The reason why it's wrong is clear from

R code to categorize age into group/ bins/ breaks

僤鯓⒐⒋嵵緔 提交于 2019-12-17 06:11:56
问题 I am trying to categorize age into group so it will not be continuous. I have this code: data$agegrp(data$age>=40 & data$age<=49) <- 3 data$agegrp(data$age>=30 & data$age<=39) <- 2 data$agegrp(data$age>=20 & data$age<=29) <- 1 the above code is not working under survival package. It's giving me: invalid function in complex assignment Can you point me where the error is? data is the dataframe I am using. 回答1: I would use findInterval() here: First, make up some sample data set.seed(1) ages <-

ggplot2: Group histogram data by year

旧时模样 提交于 2019-12-13 09:48:47
问题 I have a bunch of data with a YYYY-MM-DD date attached to it, and I'm having trouble getting a single bar for each year. In other words, all data from 2014 showing up under a bar for 2014. The dates were converted to the YYYY-MM-DD format using df1$Close.Date <- as.Date(df1$Close.Date, "%m/%d/%Y") Here's my histogram formula that doesn't group correctly ggplot(df1, aes(Close.Date, fill=Stage)) + geom_bar() I've tried messing around with breaks and binwidth with no success 回答1: I fixed it by:

Separating data into bins and calculating averages

三世轮回 提交于 2019-12-12 00:22:22
问题 I have this sample data Time(s) Bacteria count 0.4 2 0.82 5 6.67 8 7.55 11 8.21 14 8.89 17 9.4 20 10.18 23 10.85 26 11.35 29 11.85 32 12.41 35 13.36 38 13.86 41 14.57 44 15.08 47 15.67 50 16.09 53 16.59 56 18.53 59 24.43 62 25.32 65 25.97 68 26.37 71 26.93 74 27.87 77 28.33 80 29.1 83 29.88 84 30.88 85 31.99 86 35.65 87 36.06 88 36.46 89 36.96 90 37.39 91 37.95 92 38.56 93 39.22 94 39.79 95 40.56 96 41.47 97 42.02 98 42.73 99 43.4 100 43.93 101 44.67 102 45.24 103 45.9 104 46.58 105 47.22 106

What package is to be installed in R for scatter plots with logarithmic binning?

别等时光非礼了梦想. 提交于 2019-12-11 08:06:37
问题 I am trying to produce some high density scatter plots with R. What package should be installed for this? Or is there any other way to obtain the plots. 回答1: If you really do want a log scaled scatterplot, then this is how to create them in each of the 3 plotting systems. First, some data: dfr <- data.frame(x = rlnorm(1e5), y = rlnorm(1e5)) In base graphics: with(dfr, plot(x, y, log = "xy")) In lattice graphics: library(lattice) p1 <- xyplot(y ~ x, dfr, scales = list(log = TRUE)) p1 In