How to create histogram in R with CSV time data?

筅森魡賤 提交于 2019-12-19 04:05:37

问题


I have CSV data of a log for 24 hours that looks like this:

svr01,07:17:14,'u1@user.de','8.3.1.35'
svr03,07:17:21,'u2@sr.de','82.15.1.35'
svr02,07:17:30,'u3@fr.de','2.15.1.35'
svr04,07:17:40,'u2@for.de','2.1.1.35'

I read the data with tbl <- read.csv("logs.csv")

How can I plot this data in a histogram to see the number of hits per hour? Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.

Thank you for helping me here!


回答1:


An example dataset:

dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
                 time = Sys.time() + sort(round(runif(1000, 1, 36000))))

The trick I use is to create a new variable which only specifies in which hour the hit was recorded:

dat$hr = strftime(dat$time, "%H")

Now we can use some plyr magick:

hits_hour = count(dat, vars = c("server","hr"))

And create the plot:

ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")

Which looks like:

I don't really like this plot, I'd be more in favor of:

ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)

Which looks like:

Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.




回答2:


I don't know if I understood you right, so I will split my answer in two parts. The first part is how to convert your time into a vector you can use for plotting.

a) Converting your data into hours:

  #df being the dataframe
  df$timestamp <- strptime(df$timestamp, format="%H:%M:%S")
  df$hours <-  as.numeric(format(df$timestamp, format="%H"))
  hist(df$hours)

This gives you a histogram of hits over all servers. If you want to split the histograms this is one way but of course there are numerous others:

b) Making a histogram with ggplot2

 #install.packages("ggplot2")
  require(ggplot2)
  ggplot(data=df) + geom_histogram(aes(x=hours), bin=1) +  facet_wrap(~ server)
  # or use a color instead
  ggplot(data=df) + geom_histogram(aes(x=hours, fill=server), bin=1)

c) You could also use another package:

 require(plotrix)
 l <- split(df$hours, f=df$server)
 multhist(l)

The examples are given below. The third makes comparison easier but ggplot2 simply looks better I think.

EDIT

Here is how thes solutions would look like

first solution:

second solution:

third solution:



来源:https://stackoverflow.com/questions/8602472/how-to-create-histogram-in-r-with-csv-time-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!