Plotting too many points?

问题

How does R (base, lattice or whatever) create a graph from a 100000 elements vector (or a function that outputs that values)? Does it plot some and reject others? plot all on top of each other? How can I change this behaviour?

How could I crate a graph where for every interval I see the max and min values, as in the trading "bar" charts? (or any other idea to visualize that much info without needing to previously calculate intervals, mins and maxs myself nor using financial pakages)

How could I create a large "horizontally scrolleable" plot?

For example I want to plot the first 100000 iterations

zz <- (zz^2+1) %% nn

starting at zz=1, nn = 10^7+1 The x axis would be just the iteration number.

Summarizing. I want to plot a the output of a function that is sometimes soft but sometimes very spiky, over a very large interval. That spikes are very important.

regards

回答1:

You mention tha tyou sometimes have spikes which are vey important.

See below how I plot ping results, where the vast majority of data is in the milliseconds, but the spikes are important for me as well:

Basically, I hexbin all data points with response time < 500 ms, and plot points for all longer response times. 5s response time is additionally marked as timeout:

ggplot (df, aes (x = date, y = t5)) + 
        stat_binhex (data = df [df$t5 <= 0.5,], bins = nrow (df) / 250) +
        geom_point (data = df [df$t5 > 0.5,], aes (col = type), shape = 3) +
        ylim (c (0, 5)) +
        scale_fill_gradient (low = "#AAAAFF", high = "#000080") +
        scale_colour_manual ("response type", 
                             values = c (normal = "black", timeout = "red")) + 
        ylab ("t / s")

I think I already posted this as a solution to a similar question, but I couldn't find it.

回答2:

If R can produce the plot, it will simply plot the points, even if they are on top of each other. In general, such a large number of points is not really useful to plot, and not necessary. Some strategies to deal with this are:

Subsample, say, 2% of the data and plot it. Repeat this several times to see if the outcome changes
Don't plot the raw data, but aggregate first. Think of calculating a temporal mean, binning data first, etc.

回答3:

R will plot all the points and things might look cluttered.

This is a new package, but check out Hadley's bigvis package

回答4:

curvemight be a nice way to go here:

f <- function(x){(x^2+1)%%(1+1e7)}
curve(f, from=1, to=1e5)

来源：https://stackoverflow.com/questions/15901834/plotting-too-many-points

标签

plot

large-data