问题
This is a "let's try another way" post that is related to this one:
Is it possible to define/modify a reading function that
can handle the fact that data stored in an
xlsx
file is row-based (i.e. each row represents a variable)and transforms it accordingly so it can be stored in a column-based
data.frame
(i.e. what used to be a row inxlsx
becomes a column)while capturing the underlying class/data type of the row-based variables?
Regarding csv
files I would probably start with turning to readLines
, but unfortunately xlsx
is still a black box to me.
Here's a little xlsx
file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx
回答1:
What about slightly modifying the read.xlsx
function from the xlsx
package:
library(xlsx)
read.transposed.xlsx <- function(file,sheetIndex) {
df <- read.xlsx(file, sheetIndex = sheetIndex , header = FALSE)
dft <- as.data.frame(t(df[-1]), stringsAsFactors = FALSE)
names(dft) <- df[,1]
dft <- as.data.frame(lapply(dft,type.convert))
return(dft)
}
# Let's test it
read.transposed.xlsx("row-and-column-based-data.xlsx", sheetIndex = 2)
# variable var_1 var_2 var_3
#1 2016-01-01 1 a TRUE
#2 2016-01-02 2 b FALSE
#3 2016-01-03 3 c TRUE
回答2:
> library(openxlsx)
> library(reshape)
> x=read.xlsx("row-and-column-based-data.xlsx",sheet = 2);
> x
variable 2016-01-01 2016-01-02 2016-01-03
1 var_1 1 2 3
2 var_2 a b c
3 var_3 TRUE FALSE TRUE
> y=t(x)
> colnames(y)=y[1,]
> y=y[2:nrow(y),]
> cc=data.frame(y, stringsAsFactors = F)
> cc
var_1 var_2 var_3
2016-01-01 1 a TRUE
2016-01-02 2 b FALSE
2016-01-03 3 c TRUE
> sapply(cc, class)
var_1 var_2 var_3
"character" "character" "character"
> write.csv(cc,"temp.csv")
> bb=read.csv("temp.csv") #infer magically types
> bb
X var_1 var_2 var_3
1 2016-01-01 1 a TRUE
2 2016-01-02 2 b FALSE
3 2016-01-03 3 c TRUE
> sapply(bb, class)
X var_1 var_2 var_3
"factor" "integer" "factor" "logical"
or use stringsAsFactors=F if you prefer character data type:
> bb=read.csv("temp.csv", stringsAsFactors = F) #infer magically types
> bb
X var_1 var_2 var_3
1 2016-01-01 1 a TRUE
2 2016-01-02 2 b FALSE
3 2016-01-03 3 c TRUE
> sapply(bb, class)
X var_1 var_2 var_3
"character" "integer" "character" "logical"
来源:https://stackoverflow.com/questions/35234038/reading-horizontal-row-based-data-from-xlsx-files-into-r-data-frames