histogram | 易学教程

Binning in Numpy

阅读更多关于 Binning in Numpy

问题 I have an array A which I am trying to put into 10 bins. Here is what I've done. A = range(1,94) hist = np.histogram(A, bins=10) np.digitize(A, hist[1]) But the output has 11 bins, not 10, with the last value (93) placed in bin 11, when it should have been in bin 10. I can fix it with a hack, but what's the most elegant way of doing this? How do I tell digitize that the last bin in hist[1] is inclusive on the right - [ ] instead of [ )? 回答1: The output of np.histogram actually has 10 bins;

How to pick unique colors of histogram bars in matplotlib?

阅读更多关于 How to pick unique colors of histogram bars in matplotlib?

I am trying to plot a several histogram on the same plot but I figured out that some colors are assigned to different series, which bother me a little. Is there a way of forcing color bars to be unique ? That works for small data set, but when I use a lot of data, I see this problem coming back here is an example, the blue color is assigned twice to two different data samples All the examples and the solutions to attribute colors to histograms in matplotlib (at least those I found) are suggesting to normalize x axis between 0 and 1 like this example , but this is not what I want to have

Why does numpy.histogram (Python) leave off one element as compared to hist in Matlab?

阅读更多关于 Why does numpy.histogram (Python) leave off one element as compared to hist in Matlab?

问题 I am trying to convert some Matlab code to Python, and the Matlab code looks like: [N,X] = hist(Isb*1e6, -3:0.01:0) where Isb is a 2048000 element 1D array. N is output as a 301 element 1D array. My Python code looks like: import numpy as np N,X = np.histogram(Isb*1e6,np.array(-3,0.01,0.01)) but the N Python outputs is a 300 element 1D array where the last element from the Matlab N is left off. Is there a way to replicate what Matlab does more accurately? I need N and X to be the same size so

Histogram with weights in R

阅读更多关于 Histogram with weights in R

I need to plot a weighted histogram of density rather than frequency. I know that freq = FALSE is available in hist() but you can't specify weights. In ggplot2 I can do this: library(ggplot2) w <- seq(1,1000) w <-w/sum(w) v <- sort(runif(1000)) foo <- data.frame(v, w) ggplot(foo, aes(v, weight = w)) + geom_histogram() But where is the equivalent of freq = FALSE ? By default, geom_histogram() will use frequency rather than density on the y-axis. However, you can change this by setting your y aesthetic to ..density.. like so: ggplot(foo, aes(x = v, y = ..density.., weight = w)) + geom_histogram(

Coloring a geom_histogram by gradient

阅读更多关于 Coloring a geom_histogram by gradient

问题 I'm trying to plot a geom_histogram where the bars are colored by a gradient. This is what I'm trying to do: library(ggplot2) set.seed(1) df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F) ggplot(df,aes_string(x="val",y="..count..+1",fill="val"))+geom_histogram(binwidth=1,pad=TRUE)+scale_y_log10()+scale_fill_gradient2("val",low="darkblue",high="darkred") But getting: Any idea how to get it colored by the defined gradient? 回答1: Not sure you can fill by val

How to Plot a Pre-Binned Histogram In R

阅读更多关于 How to Plot a Pre-Binned Histogram In R

I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [(0.04, 6179)]. What's the fastest and or easiest way to do this in R? Looks like ggplot2 has the answer.

Scala simple histogram

阅读更多关于 Scala simple histogram

For a given Array[Double] , for instance val a = Array.tabulate(100){ _ => Random.nextDouble * 10 } what is a simple approach to calculate a histogram with n bins ? A very similar preparation of values as in @om-nom-nom 's answer, yet the histogram method quite small by using partition , case class Distribution(nBins: Int, data: List[Double]) { require(data.length > nBins) val Epsilon = 0.000001 val (max,min) = (data.max,data.min) val binWidth = (max - min) / nBins + Epsilon val bounds = (1 to nBins).map { x => min + binWidth * x }.toList def histo(bounds: List[Double], data: List[Double]):

How to pick unique colors of histogram bars in matplotlib?

阅读更多关于 How to pick unique colors of histogram bars in matplotlib?

问题 I am trying to plot a several histogram on the same plot but I figured out that some colors are assigned to different series, which bother me a little. Is there a way of forcing color bars to be unique ? That works for small data set, but when I use a lot of data, I see this problem coming back here is an example, the blue color is assigned twice to two different data samples All the examples and the solutions to attribute colors to histograms in matplotlib (at least those I found) are

Creating binned histograms in Spark

阅读更多关于 Creating binned histograms in Spark

Suppose I have a dataframe (df) (Pandas) or RDD (Spark) with the following two columns: timestamp, data 12345.0 10 12346.0 12 In Pandas, I can create a binned histogram of different bin lengths pretty easily. For example, to create a histogram over 1 hr, I do the following: df = df[ ['timestamp', 'data'] ].set_index('timestamp') df.resample('1H',how=sum).dropna() Moving to Pandas df from Spark RDD is pretty expensive for me (considering the dataset). Consequently, I prefer to stay within the Spark domain as much as possible. Is there a way to do the equivalent in Spark RDD or dataframes?

Select data for 15 minute windows - PostgreSQL

阅读更多关于 Select data for 15 minute windows - PostgreSQL

Right so I have a table such as this in PostgreSQL: timestamp duration 2013-04-03 15:44:58 4 2013-04-03 15:56:12 2 2013-04-03 16:13:17 9 2013-04-03 16:16:30 3 2013-04-03 16:29:52 1 2013-04-03 16:38:25 1 2013-04-03 16:41:37 9 2013-04-03 16:44:49 1 2013-04-03 17:01:07 9 2013-04-03 17:07:48 1 2013-04-03 17:11:00 2 2013-04-03 17:11:16 2 2013-04-03 17:15:17 1 2013-04-03 17:16:53 4 2013-04-03 17:20:37 9 2013-04-03 17:20:53 3 2013-04-03 17:25:48 3 2013-04-03 17:29:26 1 2013-04-03 17:32:38 9 2013-04-03 17:36:55 4 And I would like to get the following output: timestampwindowstart = 2013-04-03 15:44:58