问题
I'm given some data in an .rData file. The format is an xts
object in character mode. (I realise this is an unusual format, but I have no control over it)
> head(trades)
SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
2012-05-04 09:30:00 "BAC" "T" "7.89" "38538" "F" "7.89" "523" "7.9"
2012-05-04 09:30:01 "BAC" "Z" "7.885" "288" "@" "7.88" "61033" "7.9"
2012-05-04 09:30:03 "BAC" "X" "7.89" "1000" "@" "7.88" "1974" "7.89"
2012-05-04 09:30:07 "BAC" "T" "7.89" "19052" "F" "7.88" "1058" "7.89"
2012-05-04 09:30:08 "BAC" "Y" "7.89" "85053" "F" "7.88" "108101" "7.9"
2012-05-04 09:30:09 "BAC" "D" "7.8901" "10219" "@" "7.89" "268" "7.9"
> mode(trades)
'character'
I would like to process this data by converting into a saner format, namely a tibble, so that I can store the columns as datetimes, doubles and integers.
I've managed to achieve this with the following code:
> trades_ = bind_cols(data_frame(DATE=index(trades)), as_data_frame(coredata(trades))) %>%
mutate_at(as.numeric, .cols=vars(PRICE, BID, OFR)) %>%
mutate_at(as.integer, .cols=vars(SIZE, BIDSIZ, OFRSIZ))
> head(trades_)
# A tibble: 6 × 10
DATE SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
<dttm> <chr> <chr> <dbl> <int> <chr> <dbl> <int> <dbl>
1 2012-05-04 09:30:00 BAC T 7.8900 38538 F 7.89 523 7.90
2 2012-05-04 09:30:01 BAC Z 7.8850 288 @ 7.88 61033 7.90
3 2012-05-04 09:30:03 BAC X 7.8900 1000 @ 7.88 1974 7.89
4 2012-05-04 09:30:07 BAC T 7.8900 19052 F 7.88 1058 7.89
5 2012-05-04 09:30:08 BAC Y 7.8900 85053 F 7.88 108101 7.90
6 2012-05-04 09:30:09 BAC D 7.8901 10219 @ 7.89 268 7.90
I'm wondering if there's already a built-in function for this. Something that looks at each column of the trades
matrix and figures out whether it's a column of integers, doubles, etc and converts it to the appropriate type.
This is the sort of thing that a csv parser would do.
回答1:
This is far from an authoritative answer, but I ended up doing this:
smarter_type_convert = function (vector) {
converted_vector = type.convert(vector)
if (is.numeric(converted_vector)) {
int_vector = as.integer(converted_vector)
if (isTRUE(all.equal(int_vector, converted_vector, check.attributes=FALSE))) {
int_vector
} else {
converted_vector
}
} else {
converted_vector
}
}
trades %>% coredata %>% as_data_frame %>% mutate_all(smarter_type_convert)
回答2:
You're right that a data frame is the right approach since you are working with multiple classes in the same frame. xts
doesn't allow multiple classes so coercion rules force you to work with characters instead of numbers.
Here's a solution so you don't have to specifically call out each column. I use the tidyquant
package which is made for working with quantitative data within the "tidyverse" (i.e. using "tidy" data frames). It also has some nice functions to convert to and from xts
, matrix
and other time series classes that contain row names.
First, I recreate the data.
> trades_xts
SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
2012-05-04 09:30:00 "BAC" "T" "7.8900" "38538" "F" "7.89" "523" "7.90"
2012-05-04 09:30:01 "BAC" "Z" "7.8850" "288" "@" "7.88" "61033" "7.90"
2012-05-04 09:30:03 "BAC" "X" "7.8900" "1000" "@" "7.88" "1974" "7.89"
2012-05-04 09:30:07 "BAC" "T" "7.8900" "19052" "F" "7.88" "1058" "7.89"
2012-05-04 09:30:08 "BAC" "Y" "7.8900" "85053" "F" "7.88" "108101" "7.90"
2012-05-04 09:30:09 "BAC" "D" "7.8901" "10219" "@" "7.89" "268" "7.90"
Next, I use tidy functions to clean the data. It's a little longer than your script, but you don't need to worry about which columns have which data type (with the exception of the xts index). Note that I'm using the tidyquant::as_tibble()
functions to convert the xts
row names to a column. I use mutate_each
to apply the type.convert
function to each column. Unfortunately base R loves the factor
class so I add an additional step to convert to character
. The last two steps just clean up the date-time column using dplyr::rename
and lubridate::as_datetime
, which tidyquant
loads for you.
> library(tidyquant)
> trades_xts %>%
as_tibble(preserve_row_names = TRUE) %>%
mutate_each(funs(type.convert)) %>%
mutate_if(is.factor, as.character) %>%
rename(DATE = row.names) %>%
mutate(DATE = as_datetime(DATE, tz = Sys.timezone()))
# A tibble: 6 × 9
DATE SYMBOL EX PRICE SIZE COND BID BIDSIZ OFR
<dttm> <chr> <chr> <dbl> <int> <chr> <dbl> <int> <dbl>
1 2012-05-04 09:30:00 BAC T 7.8900 38538 F 7.89 523 7.90
2 2012-05-04 09:30:01 BAC Z 7.8850 288 @ 7.88 61033 7.90
3 2012-05-04 09:30:03 BAC X 7.8900 1000 @ 7.88 1974 7.89
4 2012-05-04 09:30:07 BAC T 7.8900 19052 F 7.88 1058 7.89
5 2012-05-04 09:30:08 BAC Y 7.8900 85053 F 7.88 108101 7.90
6 2012-05-04 09:30:09 BAC D 7.8901 10219 @ 7.89 268 7.90
来源:https://stackoverflow.com/questions/42475874/how-can-i-convert-a-matrix-of-strings-into-a-tibble