How to add a index by set of data when using rbindlist?

后端未结

关注

 2  1995

I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist().

相关标签:

2条回答

南旧

2020-11-29 12:40
You are only missing the idcol argument from rbindlist(). Run:
```
x2csv <- rbindlist(lapply(files, fread, stringsAsFactors = FALSE), fill = TRUE, idcol = TRUE )
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-11-29 12:44
This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:
```
x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]
```
- fread() uses stringsAsFactors = FALSE by default so we can save some keystrokes
- Also fill = TRUE is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns
- The id col can be named (the default is .id) and is populated with the sequence number of the list element.
- Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number. basename() strips the path off the file name.
0 讨论(0)
发布评论:

提交评论
- 加载中...