问题
I posed this question a few days ago but it has diverged in the comments enough that I think it's worthy of asking many of the points in the comments in a new question. I apologize for the length of this question. I hope that it isn't terribly unclear.
I'm working on creating a chart using ggplot2 in a R Sweave file used to track benchmarks for software. I'm looking specifically at run times for the benchmark and either want to use % Deviation or deviation by minutes from the earliest test run that we have data for. The reason I say earliest and not the first is because in earlier versions of the software, we had frequent crashes which led to specific test runs not having data for a particular file.
This is the R code I'm currently trying to build the chart with:
library(ggplot2)
library(data.table) #package I'm using to create the benchmark data
dbhandle <- SQLConn_remote(DBName = "DATABASE", ServerName = "SERVER")
Testdf<-sqlQuery(dbhandle, 'select * from TABLENAME order by FileName, Number, Category', stringsAsFactors = FALSE)
versions<-unique(Testdf[order(Testdf$Number), ][,2])
#using data.table package
setDT(Testdf)
Testdf[, Benchmark := Value[which.min(Number)], by = "FileName"]
Testdf$Version<-factor(Testdf$Version, levels = versions)
Testdf$Deviation<-Testdf$Value- Testdf$Benchmark
Testdf$DeviationP<-(Testdf$Value- Testdf$Benchmark)/Testdf$Benchmark
g<-ggplot(subset(Testdf, Category == 'Time' & !is.na(Value) & Deviation <.5) , aes(color = Value, x = Version, y = Deviation, group = FileName)) +
geom_line(size=.25) + geom_point(aes(shape = Build), size = 1.5) +
scale_shape_manual(values=c(1,15)) + stat_summary(fun.y=sum, geom="line") +
ylab("Run Time Deviation from Benchmark (min)") +
scale_colour_gradient(name = 'Run Time (min)',low = 'blue', high = 'red') +
theme(axis.text.x = element_text(size = 10, angle = 90, vjust = .5)) + theme(axis.title.y = element_text(vjust = 1)) +
theme(axis.title.x = element_text(vjust = -0.1)) + theme(plot.margin=unit(c(0,0,0,0),"mm"))
g
(If you'd like to recreate this, see the example data frame at the bottom)
This is an example of what the SQL Table looks like:

And this is what the chart looks like when generated using the actual SQL data:

The main problem is that every single line should begin at zero. It shouldn't begin at the virst x-axis tick point because like I said above, sometimes errors cause the program to crash during a run leaving no data. So the benchark should be calculated from the earliest available run or what ever Number
is min for that particular FileName
. Each line should represent the Category
of Time
for a specific FileName
and the chart is Value
vs Version
.
The headache for me is why this isn't working. I need the benchmark to be chosen by the minimum Number
entry for a specific FileName
and chart its Time
.
EDIT: I realized that using data.table is going to cause problems further down in the code so doing this without that package would be preferable.
2nd EDIT:
Here's what the benchmark data needs to be (and I'm having code block). I need to select the minimum Number
entry for each unique FileName
. Once that's done, I need there to be a new column on Testdf called Benchmark that is the Value
of the Time Category
for each unique FileName
at the minimum Number
.
Here's an quick example data frame I created for you to use in recreating the SQL Table:
rw1 <- c("File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3")
rw2 <- c("0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03")
rw3 <- c("Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final")
rw4 <- c(123, 456, 789, 312, 645, 978, 741, 852, 963, 369, 258, 147, 753, 498, 951, 753, 915, 438, 978, 741, 852, 963, 369, 258, 147, 753, 498)
rw5 <- c("01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12")
rw6 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
rw7 <- c("Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release")
rw8 <- c("None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "Cannot Connect to Database", "None", "None", "None", "None", "None", "None", "None", "None")
Testdf = data.frame(rw1, rw2, rw3, rw4, rw5, rw6, rw7, rw8)
colnames(Testdf) <- c("FileName", "Version", "Category", "Value", "Date", "Number", "Build", "Error")
来源:https://stackoverflow.com/questions/31321573/chart-for-benchmark-data-isnt-calculated-properly