问题
I have several different csv files with the same structure. I read them into R using fread, and then union them into a bigger dataset using rbindlist()
.
files <- list.files( pattern = "*.csv" );
x2csv <- rbindlist( lapply(files, fread, stringsAsFactors=FALSE), fill = TRUE )
The code works weel. However, I would like to add a column filled with numbers to indicate from which csv file that observation came from. For exemple, the output should be:
V1 V2 V3 C1
1: 0 0.2859163 0.55848521 1
2: 1 1.1616298 0.87571349 1
3: 2 2.1122510 0.95062116 2
4: 3 2.6832013 0.57095035 2
5: 4 2.9117493 0.22854804 2
6: 5 2.9886040 0.07685464 3
where C1 is the new index column telling that: the first and second observations come from files[1] (the first .csv file); the third and fourth observation come from files[1] (the first .csv file); and so on.
回答1:
This is an enhanced version of Nicolás' answer which adds the file names instead of numbers:
x2csv <- rbindlist(lapply(files, fread), idcol = "origin")
x2csv[, origin := factor(origin, labels = basename(files))]
fread()
usesstringsAsFactors = FALSE
by default so we can save some keystrokes- Also
fill = TRUE
is only required if we want to read files with differing structure, e.g., differing position, name, or number of columns - The id col can be named (the default is
.id
) and is populated with the sequence number of the list element. - Then, this number is converted into a factor whose levels are labeled with the file names. A file name might be easier to remember than just a mere number.
basename()
strips the path off the file name.
回答2:
You are only missing the idcol
argument from rbindlist()
. Run:
x2csv <- rbindlist(lapply(files, fread, stringsAsFactors = FALSE), fill = TRUE, idcol = TRUE )
来源:https://stackoverflow.com/questions/49100250/add-file-name-to-appended-dataset-for-each-file-in-r