I'm trying to create a plot similar to the ones here:
Basically I want a histogram, where each bin shows how long was spent in that range of cadence (e.g 1 hour in 0-20rpm, 3 hours in 21-40rpm, etc)
library("rjson") # 3rd party library, so: install.packages("rjson")
# Load data from Strava API.
# Ride used for example is http://app.strava.com/rides/13542320
url <- "http://app.strava.com/api/v1/streams/13542320?streams[]=cadence,time"
d <- fromJSON(paste(readLines(url)))
Each value in d$cadence (rpm) is paired with the same index in d$time (the number of seconds from the start).
The values are not necessarily uniform (as can be seen if you compare plot(x=d$time, y=d$cadence, type='l') with plot(d$cadence, type='l') )
If I do the simplest possible thing:
hist(d$cadence)
..this produces something very close, but the Y value is "frequency" instead of time, and ignores the time between each data point (so the 0rpm segment in particular will be underrepresented)
You need to create a new column to account for the time between samples.
I prefer data.frames to lists for this kind of thing, so:
d <- as.data.frame(fromJSON(paste(readLines(url))))
d$sample.time <- 0
d$sample.time[2:nrow(d)] <- d$time[2:nrow(d)]-d$time[1:(nrow(d)-1)]
now that you've got your sample times, you can simply "repeat" the cadence measures for anything with a sample time more than 1, and plot a histogram of that
hist(rep(x=d$cadence, times=d$sample.time),
main="Histogram of Cadence", xlab="Cadence (RPM)",
ylab="Time (presumably seconds)")
There's bound to be a more elegant solution that wouldn't fall apart for non-integer sample times, but this works with your sample data.
EDIT: re: the more elegant, generalized solution, you can deal with non-integer sample times with something like new.d <- aggregate(sample.time~cadence, data=d, FUN=sum), but then the problem becomes plotting a histogram for something that looks like a frequency table, but with non-integer frequencies. After some poking around, I'm coming to the conclusion you'd have to roll-your-own histogram for this case by further aggregating the data into bins, and then displaying them with a barchart.
来源:https://stackoverflow.com/questions/11529146/r-histogram-showing-time-spent-in-each-bin