fread from data.table package when column names include spaces and special characters?

别等时光非礼了梦想. 提交于 2020-01-21 06:40:27

问题


I have a csv file where column names include spaces and special characters.

fread imports them with quotes - but how can I change this behaviour? One reason is that I have column names starting with a space and I don't know how to handle them.

Any pointers would be helpful.

Edit: An example.

> packageVersion("data.table")
[1] ‘1.8.8’

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE)

> head(p2p[,list(Principal remaining)])
Error: unexpected symbol in "head(p2p[,list(Principal remaining"

> head(p2p[,list("Principal remaining")])
                    V1
1: Principal remaining

> head(p2p[,list(c("Principal remaining"))])
                    V1
1: Principal remaining

What I was expecting/want is of course, what a column name without spaces yields:

> head(p2p[,list(Principal)])
   Principal
1:      1000
2:      1000
3:      1000
4:      2000
5:      1000
6:      4130

回答1:


It should be rather difficult to get a leading space in a column name. Should not happen by "casual coding". On the other hand I don't see very much error checking in the fread code, so maybe until this undesirable behavior is fixed, (or the feature request refused), you can do something like this:

setnames(DT, make.names(colnames(DT))) 

If on the other hand you are bothered by the fact that colnames(DT) will display the column names with quotes then just "get over it." That's how the interactive console will display any character value.

If you have a data item in a character column that looks like " ttt" in the original, then it's going to have leading spaces when imported, and you need to process it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of the several trim functions in various packages (such as 'gdata')




回答2:


A little bit modified BondedDust version, because setnames function is not used with <- sign:

setnames(DT, make.names(colnames(DT))



回答3:


You can use argument check.names=T in fread function of data.table

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE, check.names=T)

It uses make.names function in background

default is FALSE. If TRUE then the names of the variables in the data.table 
are checked to ensure that they are syntactically valid variable names. If 
necessary they are adjusted (by make.names) so that they are, and also to 
ensure that there are no duplicates.


来源:https://stackoverflow.com/questions/16966957/fread-from-data-table-package-when-column-names-include-spaces-and-special-chara

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!