问题
I'm trying to take a list and serialize each item and put it into a CSV file with a key to create a text file with key/value pairs. Ultimately this is going to run through Hadoop streaming so before you ask, I think it really does need to be in a text file. (but I'm open to other ideas) This all seemed seemed pretty straight forward at first. But I can't quite get serialization to work the way I want it (still).
If I do this:
> rawToChar(serialize("blah", NULL, ascii=T))
[1] "A\n2\n133888\n131840\n16\n1\n9\n4\nblah\n"
Then I have those pesky \n which screw up my CSV parsing later. I could go in and replace the \n with some other string, which I'm not opposed to doing. This seems a little messy, however.
The other option that came to mind is omitting the rawToChar() call and pumping the raw ascii into a text file:
> serialize("blah", NULL, ascii=T)
[1] 41 0a 32 0a 31 33 33 38 38 38 0a 31 33 31 38 34 30 0a 31 36 0a 31 0a 39 0a
[26] 34 0a 62 6c 61 68 0a
Well if I just dump that to a text file I'll get \n after each element in the list. So I tried doing a little paste/collapse:
> ser <- serialize("blah", NULL, ascii=T)
> ser2 <- paste(ser, collapse="")
> ser2
[1] "410a320a3133333838380a3133313834300a31360a310a390a340a626c61680a"
Now that's a value I can write to a CSV text file! Only... how do I turn that back into raw again later? Let's just take the first hex element: 41 I can't even figure out how to create a list of raw items and shove a hex value 41 into one of the elements. When I try to shove a raw hex value into a raw list I end up with something like this:
> r <- raw(1)
> r[1] <- 41
Error in r[1] <- 41 :
incompatible types (from double to raw) in subassignment type fix
> r[1] <- as.raw(41)
> r[1]
[1] 29
Crap! 29!=41 (except for really large values of 29 and really small values of 41, of course)
Any ideas on how to crack this nut?
回答1:
The package caTools
has a Base64 encoder-decoder that you can use:
> library(caTools)
> s<-base64encode(serialize("blah",NULL))
> s
[1] "WAoAAAACAAIKAQACAwAAAAAQAAAAAQAAAAkAAAAEYmxhaA=="
> unserialize(base64decode(s,"raw"))
[1] "blah"
回答2:
thanks to jmoy for his great answer. I used his recommendation and it works great. For future hitchhikers who end up here, I'm leaving my functions for turning a list into a serialized CSV text files and then turning them back into lists. I'm marking this post as community wiki. Feel free to edit it if there is a cleaner way of doing any of this:
listToCsv <- function(inList, outFileName){
require(caTools)
if (is.list(inList) == F)
stop("listToCsv: The input list fails the is.list() check.")
fileName <- outFileName
cat("", file=fileName, append=F)
i <- 1
for (item in inList) {
myLine <- paste(i, ",", base64encode(serialize(item, NULL, ascii=T)), "\n", sep="")
cat(myLine, file=fileName, append=T)
i <- i+1
}
}
csvToList <- function(inFileName){
require(caTools)
linesIn <- readLines(fileName, n=-1)
outList <- NULL
i <- 1
for (line in linesIn){
outList[[i]] <- unserialize(base64decode(strsplit(linesIn[[i]], split=",")[[1]][[2]], "raw"))
i <- i+1
}
return(outList)
}
回答3:
Maybe you wanted as.raw(65)
instead as 65 (in decimal) is 41 (in hex)
> as.hexmode(65)
[1] "41"
As for the encoding, can you work with binary data within Hadoop streaming?
来源:https://stackoverflow.com/questions/3114043/r-creating-a-csv-out-of-serialized-objects