Chart for Benchmark data isn't calculated properly

有些话、适合烂在心里 提交于 2020-01-17 05:59:09

问题


I posed this question a few days ago but it has diverged in the comments enough that I think it's worthy of asking many of the points in the comments in a new question. I apologize for the length of this question. I hope that it isn't terribly unclear.

I'm working on creating a chart using ggplot2 in a R Sweave file used to track benchmarks for software. I'm looking specifically at run times for the benchmark and either want to use % Deviation or deviation by minutes from the earliest test run that we have data for. The reason I say earliest and not the first is because in earlier versions of the software, we had frequent crashes which led to specific test runs not having data for a particular file.

This is the R code I'm currently trying to build the chart with:

library(ggplot2)
library(data.table) #package I'm using to create the benchmark data

dbhandle <- SQLConn_remote(DBName = "DATABASE", ServerName = "SERVER")
Testdf<-sqlQuery(dbhandle, 'select * from TABLENAME order by FileName, Number, Category', stringsAsFactors = FALSE)

versions<-unique(Testdf[order(Testdf$Number), ][,2])

#using data.table package
setDT(Testdf)
Testdf[, Benchmark := Value[which.min(Number)], by = "FileName"]

Testdf$Version<-factor(Testdf$Version, levels = versions)
Testdf$Deviation<-Testdf$Value- Testdf$Benchmark
Testdf$DeviationP<-(Testdf$Value- Testdf$Benchmark)/Testdf$Benchmark

g<-ggplot(subset(Testdf, Category == 'Time' & !is.na(Value) & Deviation <.5) , aes(color = Value, x = Version, y = Deviation, group = FileName)) + 
  geom_line(size=.25) + geom_point(aes(shape = Build), size = 1.5) +
  scale_shape_manual(values=c(1,15)) + stat_summary(fun.y=sum, geom="line") + 
  ylab("Run Time Deviation from Benchmark (min)") +  
  scale_colour_gradient(name = 'Run Time (min)',low = 'blue', high = 'red') + 
  theme(axis.text.x = element_text(size = 10, angle = 90, vjust = .5)) + theme(axis.title.y = element_text(vjust = 1)) + 
  theme(axis.title.x = element_text(vjust = -0.1)) + theme(plot.margin=unit(c(0,0,0,0),"mm"))
g

(If you'd like to recreate this, see the example data frame at the bottom)

This is an example of what the SQL Table looks like:

And this is what the chart looks like when generated using the actual SQL data:

The main problem is that every single line should begin at zero. It shouldn't begin at the virst x-axis tick point because like I said above, sometimes errors cause the program to crash during a run leaving no data. So the benchark should be calculated from the earliest available run or what ever Numberis min for that particular FileName. Each line should represent the Category of Time for a specific FileName and the chart is Value vs Version.

The headache for me is why this isn't working. I need the benchmark to be chosen by the minimum Number entry for a specific FileName and chart its Time.

EDIT: I realized that using data.table is going to cause problems further down in the code so doing this without that package would be preferable.

2nd EDIT:

Here's what the benchmark data needs to be (and I'm having code block). I need to select the minimum Number entry for each unique FileName. Once that's done, I need there to be a new column on Testdf called Benchmark that is the Value of the Time Category for each unique FileName at the minimum Number.


Here's an quick example data frame I created for you to use in recreating the SQL Table:

rw1 <- c("File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3", "File1", "File1", "File1", "File2", "File2", "File2", "File3", "File3", "File3")
rw2 <- c("0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.01", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.02", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03", "0.03")
rw3 <- c("Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final", "Time", "Size", "Final")
rw4 <- c(123, 456, 789, 312, 645, 978, 741, 852, 963, 369, 258, 147, 753, 498, 951, 753, 915, 438, 978, 741, 852, 963, 369, 258, 147, 753, 498)
rw5 <- c("01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12", "01/01/12")
rw6 <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3)
rw7 <- c("Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Iteration", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release", "Release")
rw8 <- c("None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "None", "Cannot Connect to Database", "None", "None", "None", "None", "None", "None", "None", "None")


Testdf = data.frame(rw1, rw2, rw3, rw4, rw5, rw6, rw7, rw8)
colnames(Testdf) <- c("FileName", "Version", "Category", "Value", "Date", "Number", "Build", "Error") 

来源:https://stackoverflow.com/questions/31321573/chart-for-benchmark-data-isnt-calculated-properly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!