I have a csv file where some of the numerical values are expressed as strings with commas as thousand separator, e.g. \"1,513\"
instead of 1513
. Wh
"Preprocess" in R:
lines <- "www, rrr, 1,234, ttt \n rrr,zzz, 1,234,567,987, rrr"
Can use readLines
on a textConnection
. Then remove only the commas that are between digits:
gsub("([0-9]+)\\,([0-9])", "\\1\\2", lines)
## [1] "www, rrr, 1234, ttt \n rrr,zzz, 1234567987, rrr"
It's als useful to know but not directly relevant to this question that commas as decimal separators can be handled by read.csv2 (automagically) or read.table(with setting of the 'dec'-parameter).
Edit: Later I discovered how to use colClasses by designing a new class. See:
How to load df with 1000 separator in R as numeric class?