问题
I have a large vector of strings of the form:
Input = c("1,223", "12,232", "23,0")
etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input)
just outputs NA
.
My first instinct would be to go to strsplit
, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?
There's an existing question that suggests read.csv2
, but the strings in question are not directly read in that way.
回答1:
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
回答2:
scan(text=Input, dec=",")
## [1] 1.223 12.232 23.000
But it depends on how long your vector is. I used rep(Input, 1e6)
to make a long vector and my machine just hangs. 1e4
is fine, though. @adibender's solution is much faster. If we run on 1e4, a lot faster:
Unit: milliseconds
expr min lq median uq max neval
adibender() 6.777888 6.998243 7.119136 7.198374 8.149826 100
sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254 100
回答3:
Also, if you are reading in the raw data, the read.table
and all the associated functions have a dec
argument. eg:
read.table("file.txt", dec=",")
When all else fails, gsub
and sub
are your friends.
回答4:
Building on @adibender solution:
input = '23,67'
as.numeric(gsub(
# ONLY for strings containing numerics, comma, numerics
"^([0-9]+),([0-9]+)$",
# Substitute by the first part, dot, second part
"\\1.\\2",
input
))
I guess that is a safer match...
回答5:
The readr
package has a function to parse numbers from strings. You can set many options via the locale
argument.
For comma as decimal separator you can write:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
回答6:
As stated by , it's way easier to do this while importing a file.
Thw recently released reads package has a very useful features, locale
, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",")
as argument.
回答7:
The answer by adibender does not work when there are multiple commas.
In that case the suggestion from use554546 and answer from Deena can be used.
Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))
ouput:
[1] 1223765 122325000 23054
The function gsub
replaces all occurances. The function sub
replaces only the first.
来源:https://stackoverflow.com/questions/15236440/as-numeric-with-comma-decimal-separators