问题
I have many (more than 100) csv files with same table structure for example in all table headers are in row 4 and they all have 6 columns and the data are from row 5 to 400001,
I need to plot these data in a scatter plot which x shows the first column (40001 time unit) and the other columns are Ys for different variables, [its preferable if I be able to format a plot (colors, ranges, titles, legends , ...)] and automatically input these csv files and export png or pdf or anything else that might be useful , I have both Excel and R but I don't know how to do this plotting in an efficient manner. (Naming is also important, they shall have the name of their CSV files)
Any idea on how can I do this with less effort ?
Thanks
回答1:
Your question is a bit light on specific detail, so I'm going to make some assumptions to get started on a kind of skeleton of an answer.
Let's make some fake CSV files ones for example data
Set working directory to folder containing data...
setwd("C:/my-csv-files")
Make 100 data frames of six col by 500 rows (to keep things quick)...
df <- lapply(1:100, function(i) data.frame(cbind(1:500, matrix(sample(1000), 500, 5))))
Make 100 csv files from these data frames in the working directory...
lapply(1:length(df), function(i) write.csv(df[[i]],file=paste("df",i,"csv",sep=".")))
Now we can reproduce your problem and quickly read many CSV files into R like so...
# create a list of all CSV files in all the folders
files <- (dir("C:/my-csv-files", recursive=TRUE, full.names=TRUE, pattern="\\.(csv|CSV)$"))
# read in the CSV files and add the filename of each file as a column to
# each dataset so we can trace back dodgy data
# so, create a function to read the CSV and get filenames
read.tables <- function(file.names, ...) {
require(plyr)
ldply(file.names, function(fn) data.frame(Filename=fn, read.csv(fn, ...)),.progress = 'text')
}
# execute function to read in data from each CSV, including file names of file that data comes from
mydata <- read.tables(files, stringsAsFactors = FALSE)
Now plot data, you say you just want one plot of all the data in the CSV files...
Melt into a format for plotting, here X1
is your time variable and X2
to X5
are the other variables in your CSV files
require(reshape2)
dat <- melt(mydata, id.vars = c("X1"), measure.vars = c("X2", "X3", "X4", "X5"))
And here's a single scatter plot of your time variable by the other variables (colour-coded). It's just not clear from your question exactly what you want to plot, so do ask another question with more details.
require(ggplot2)
ggplot(dat, aes(X1, value)) +
geom_point(aes(colour = factor(variable)))
Now, save it as a PDF or PNG, see ?ggsave
for the numerous options here...
ggsave(file="myplot.pdf")
ggsave(file="myplot.png")
Find the location of those files
getwd()
To make one plot per CSV file here's one method
listcsvs <- lapply(files,function(i) read.csv(i, stringsAsFactors = FALSE))
names(listcsvs) <- files
require(reshape2)
require(ggplot2)
for (i in 1:length(files)) {
tmp <- melt(listcsvs[[i]], id.vars = "X1", measure.vars = c("X2", "X3", "X4", "X5"))
print(ggplot(tmp,aes(X1, value)) +
geom_point(aes(colour = factor(variable))) +
ggtitle(names(listcsvs[i]))
)
}
If you are using RStudio you can scroll through the plots and Export the ones you want to save them as a PDF or PNG.
So that's covered the main parts of your question:
- Read in a large amount of CSV files into R
- Plot data as a one scatter plot displaying several variables against one variable
- Plot data as one scatter plot per CSV file
- Save the plots as a PDF or PNG file
And as a bonus you've got code for creating example data which you can use in your future questions. In general, the better the quality of your example data, the better quality answers you'll get (as Thomas suggests in his comment).
来源:https://stackoverflow.com/questions/19852774/how-to-automatically-plot-many-csv-files-with-the-same-number-of-rows-and-column