How to convert to a time series and plot a dataframe with each day as a variable or column?

一曲冷凌霜 提交于 2020-03-24 00:17:12

问题


I am trying to get a plot of the number of Cov-19 in Italy over time, and came across this repository in GitHub, and tried to subset the data for Italy as such:

require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
head(Italy)[,45:52]

which outputs:

> head(Italy)[,45:52]
   X3.6.20 X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20
17    4636    5883    7375    9172    10149    12462    12462
   X3.13.20
17    17660

Converting this to a time series with xts led me to several posts asking how to convert a database to a time series, where every day is a row in the variable Date, but in this dataframe it seems as though the each date is a variable.

I don't necessarily need to get this formatted as a time series, but I would like to plot over time the number of cases.


Here is a way to bypass timeseries:

require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
Italy <- as.matrix(sapply(Italy, as.numeric))
plot(Italy[,1],typ='l',xlab='', ylab='', col='red', lwd=3,
     main="Italy Cov-19 cum cases")

回答1:


We can convert to xts and plot

library(xts)
plot(xts(unlist(Italy), order.by = as.Date(sub("X", "", names(Italy)),
        "%m.%d.%y")), , main = "xts plot")


Some values are 0, so converting those to NA as it can lead to Inf values when the log2 conversion is done

library(dplyr)
plot(xts(log(na_if(unlist(Italy), 0), 2), order.by = as.Date(sub("X", "", names(Italy)),
     "%m.%d.%y")), main = 'xts log2 plot')




回答2:


Here is a solution with tidyverse.

First, I use read_csv to diretly read in the csv file (the warning tells you the classes of the columns, which you can copy to the command, as all data classes were guessed correctly):

library(tidyverse)

data <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

The dates are stored as column names. I use pivot_longer to transform into a long format. Once the dates are in the new column dates, we can use lubridate::mdy (mdy = month/day/year) to transform into a proper date format:

data_long <- data %>% 
  pivot_longer(cols = -c(`Province/State`, `Country/Region`, Lat, Long),
               names_to = "date",
               values_to = "cases") %>% 
  mutate(date = lubridate::mdy(date))

Now we can subset the data for Italy and plot:

data_long_ital <- data_long %>% 
  filter(`Country/Region` == "Italy")

ggplot(data_long_ital, aes(x = date, y = cases, group = `Country/Region`))+
  geom_line() +
  scale_x_date(date_breaks = "1 weeks")



来源:https://stackoverflow.com/questions/60685525/how-to-convert-to-a-time-series-and-plot-a-dataframe-with-each-day-as-a-variable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!