问题
I am trying to get a plot of the number of Cov-19 in Italy over time, and came across this repository in GitHub, and tried to subset the data for Italy as such:
require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
head(Italy)[,45:52]
which outputs:
> head(Italy)[,45:52]
X3.6.20 X3.7.20 X3.8.20 X3.9.20 X3.10.20 X3.11.20 X3.12.20
17 4636 5883 7375 9172 10149 12462 12462
X3.13.20
17 17660
Converting this to a time series with xts
led me to several posts asking how to convert a database to a time series, where every day is a row in the variable Date, but in this dataframe it seems as though the each date is a variable.
I don't necessarily need to get this formatted as a time series, but I would like to plot over time the number of cases.
Here is a way to bypass timeseries:
require(RCurl)
require(foreign)
x = getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
corona = read.csv(text = x, sep =",",header = T)
str(corona)
Italy <- corona[,corona$Country.Region=='Italy']
Italy <- corona[corona$Country.Region=='Italy',][1,5:ncol(corona)]
Italy <- as.matrix(sapply(Italy, as.numeric))
plot(Italy[,1],typ='l',xlab='', ylab='', col='red', lwd=3,
main="Italy Cov-19 cum cases")
回答1:
We can convert to xts
and plot
library(xts)
plot(xts(unlist(Italy), order.by = as.Date(sub("X", "", names(Italy)),
"%m.%d.%y")), , main = "xts plot")
Some values are 0, so converting those to NA
as it can lead to Inf
values when the log2
conversion is done
library(dplyr)
plot(xts(log(na_if(unlist(Italy), 0), 2), order.by = as.Date(sub("X", "", names(Italy)),
"%m.%d.%y")), main = 'xts log2 plot')
回答2:
Here is a solution with tidyverse
.
First, I use read_csv
to diretly read in the csv file (the warning tells you the classes of the columns, which you can copy to the command, as all data classes were guessed correctly):
library(tidyverse)
data <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
The dates are stored as column names. I use pivot_longer
to transform into a long format. Once the dates are in the new column dates
, we can use lubridate::mdy
(mdy = month/day/year) to transform into a proper date format:
data_long <- data %>%
pivot_longer(cols = -c(`Province/State`, `Country/Region`, Lat, Long),
names_to = "date",
values_to = "cases") %>%
mutate(date = lubridate::mdy(date))
Now we can subset the data for Italy and plot:
data_long_ital <- data_long %>%
filter(`Country/Region` == "Italy")
ggplot(data_long_ital, aes(x = date, y = cases, group = `Country/Region`))+
geom_line() +
scale_x_date(date_breaks = "1 weeks")

来源:https://stackoverflow.com/questions/60685525/how-to-convert-to-a-time-series-and-plot-a-dataframe-with-each-day-as-a-variable