histogram

Numpy histogram of large arrays

允我心安 提交于 2019-11-27 12:24:57
问题 I have a bunch of csv datasets, about 10Gb in size each. I'd like to generate histograms from their columns. But it seems like the only way to do this in numpy is to first load the entire column into a numpy array and then call numpy.histogram on that array. This consumes an unnecessary amount of memory. Does numpy support online binning? I'm hoping for something that iterates over my csv line by line and bins values as it reads them. This way at most one line is in memory at any one time.

Make Frequency Histogram for Factor Variables

自古美人都是妖i 提交于 2019-11-27 11:47:13
I am very new to R, so I apologize for such a basic question. I spent an hour googling this issue, but couldn't find a solution. Say I have some categorical data in my data set about common pet types. I input it as a character vector in R that contains the names of different types of animals. I created it like this: animals <- c("cat", "dog", "dog", "dog", "dog", "dog", "dog", "dog", "cat", "cat", "bird") I turn it into a factor for use with other vectors in my data frame: animalFactor <- as.factor(animals) I now want to create a histogram that shows the frequency of each variable on the y

How can I plot a histogram of a long-tailed data using R?

自古美人都是妖i 提交于 2019-11-27 11:19:01
问题 I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram. Yes, i know this means not all bins are of equal size A simple hist(x) gives while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives none

python histogram one-liner

我只是一个虾纸丫 提交于 2019-11-27 10:51:48
There are many ways to write a Python program that computes a histogram. By histogram, I mean a function that counts the occurrence of objects in an iterable and outputs the counts in a dictionary. For example: >>> L = 'abracadabra' >>> histogram(L) {'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2} One way to write this function is: def histogram(L): d = {} for x in L: if x in d: d[x] += 1 else: d[x] = 1 return d Are there more concise ways of writing this function? If we had dictionary comprehensions in Python, we could write: >>> { x: L.count(x) for x in set(L) } but since Python 2.6 doesn't have

Getting frequency values from histogram in R

为君一笑 提交于 2019-11-27 10:18:58
问题 I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact. I have a massive dataset, now I draw a histogram with a set binwidth. I want to extract the frequency value (i.e. value on y-axis) that corresponds to each binwidth and save it somewhere. Can someone please help me with this? Thank you! 回答1: The hist function has a return value (an object of class histogram ): R> res <- hist

Matplotlib/Pandas error using histogram

拥有回忆 提交于 2019-11-27 10:13:50
问题 I have a problem making histograms from pandas series objects and I can't understand why it does not work. The code has worked fine before but now it does not. Here is a bit of my code (specifically, a pandas series object I'm trying to make a histogram of): type(dfj2_MARKET1['VSPD2_perc']) which outputs the result: pandas.core.series.Series Here's my plotting code: fig, axes = plt.subplots(1, 7, figsize=(30,4)) axes[0].hist(dfj2_MARKET1['VSPD1_perc'],alpha=0.9, color='blue') axes[0].grid

Comparing two histograms

℡╲_俬逩灬. 提交于 2019-11-27 10:04:30
For a small project, I need to compare one image with another - to determine if the images are approximately the same or not. The images are smallish, varying from 25 to 100px across. The images are meant to be of the same picture data but are sublty different, so a simple pixel equality check won't work. Consider these two possible scenarios: A security (CCTV) camera in a museum looking at an exhibit: we want to quickly see if two different video frames show the same scene, but slight differences in lighting and camera focus means they won't be identical. A picture of a vector computer GUI

Understanding TensorBoard (weight) histograms

邮差的信 提交于 2019-11-27 10:04:25
It is really straightforward to see and understand the scalar values in TensorBoard. However, it's not clear how to understand histogram graphs. For example, they are the histograms of my network weights. (After fixing a bug thanks to sunside) What is the best way to interpret these? Layer 1 weights look mostly flat, what does this mean? I added the network construction code here. X = tf.placeholder(tf.float32, [None, input_size], name="input_x") x_image = tf.reshape(X, [-1, 6, 10, 1]) tf.summary.image('input', x_image, 4) # First layer of weights with tf.name_scope("layer1"): W1 = tf.get

How to hide zero values in bar3 plot in MATLAB

爷,独闯天下 提交于 2019-11-27 09:27:35
I've got a 2-D histogram (the plot is 3D - several histograms graphed side by side) that I've generated with the bar3 plot command. However, all the zero values show up as flat squares in the x-y plane. Is there a way I can prevent MATLAB from displaying the values? I already tried replacing all zeros with NaNs, but it didn't change anything about the plot. Here's the code I've been experimenting with: x1=normrnd(50,15,100,1); %generate random data to test code x2=normrnd(40,13,100,1); x3=normrnd(65,12,100,1); low=min([x1;x2;x3]); high=max([x1;x2;x3]); y=linspace(low,high,(high-low)/4);

Different breaks per facet in ggplot2 histogram

这一生的挚爱 提交于 2019-11-27 08:22:29
A ggplot2-challenged latticist needs help: What's the syntax to request variable per-facet breaks in a histogram? library(ggplot2) d = data.frame(x=c(rnorm(100,10,0.1),rnorm(100,20,0.1)),par=rep(letters[1:2],each=100)) # Note: breaks have different length by par breaks = list(a=seq(9,11,by=0.1),b=seq(19,21,by=0.2)) ggplot(d, aes(x=x) ) + geom_histogram() + ### Here the ~breaks should be added facet_wrap(~ par, scales="free") As pointed out by jucor , here some more solutions. On special request, and to show why I am not a great ggplot fan, the lattice version library(lattice) d = data.frame(x