How to automatically plot many CSV files with the same number of rows and columns?

坚强是说给别人听的谎言 提交于 2019-12-11 12:53:46

问题


I have many (more than 100) csv files with same table structure for example in all table headers are in row 4 and they all have 6 columns and the data are from row 5 to 400001,

I need to plot these data in a scatter plot which x shows the first column (40001 time unit) and the other columns are Ys for different variables, [its preferable if I be able to format a plot (colors, ranges, titles, legends , ...)] and automatically input these csv files and export png or pdf or anything else that might be useful , I have both Excel and R but I don't know how to do this plotting in an efficient manner. (Naming is also important, they shall have the name of their CSV files)

Any idea on how can I do this with less effort ?

Thanks


回答1:


Your question is a bit light on specific detail, so I'm going to make some assumptions to get started on a kind of skeleton of an answer.

Let's make some fake CSV files ones for example data

Set working directory to folder containing data...

setwd("C:/my-csv-files")

Make 100 data frames of six col by 500 rows (to keep things quick)...

df <- lapply(1:100, function(i) data.frame(cbind(1:500, matrix(sample(1000), 500, 5))))

Make 100 csv files from these data frames in the working directory...

lapply(1:length(df), function(i) write.csv(df[[i]],file=paste("df",i,"csv",sep=".")))

Now we can reproduce your problem and quickly read many CSV files into R like so...

# create a list of all CSV files in all the folders 
files <- (dir("C:/my-csv-files", recursive=TRUE, full.names=TRUE, pattern="\\.(csv|CSV)$"))
# read in the CSV files and add the filename of each file as a column to
# each dataset so we can trace back dodgy data 
# so, create a function to read the CSV and get filenames
read.tables <- function(file.names, ...) {
  require(plyr)
  ldply(file.names, function(fn) data.frame(Filename=fn, read.csv(fn, ...)),.progress = 'text')
}
# execute function to read in data from each CSV, including file names of file that data comes from
mydata <- read.tables(files, stringsAsFactors = FALSE)

Now plot data, you say you just want one plot of all the data in the CSV files...

Melt into a format for plotting, here X1 is your time variable and X2 to X5 are the other variables in your CSV files

require(reshape2)
dat <- melt(mydata, id.vars = c("X1"), measure.vars = c("X2", "X3", "X4", "X5"))

And here's a single scatter plot of your time variable by the other variables (colour-coded). It's just not clear from your question exactly what you want to plot, so do ask another question with more details.

require(ggplot2)
ggplot(dat, aes(X1, value)) +
  geom_point(aes(colour = factor(variable)))

Now, save it as a PDF or PNG, see ?ggsave for the numerous options here...

ggsave(file="myplot.pdf")
ggsave(file="myplot.png")

Find the location of those files

getwd()

To make one plot per CSV file here's one method

listcsvs <- lapply(files,function(i) read.csv(i,  stringsAsFactors = FALSE))
names(listcsvs) <- files
require(reshape2)
require(ggplot2)
for (i in 1:length(files)) { 
  tmp <- melt(listcsvs[[i]], id.vars = "X1", measure.vars = c("X2", "X3", "X4", "X5"))
  print(ggplot(tmp,aes(X1, value)) + 
          geom_point(aes(colour = factor(variable))) +
          ggtitle(names(listcsvs[i]))
        )
}

If you are using RStudio you can scroll through the plots and Export the ones you want to save them as a PDF or PNG.

So that's covered the main parts of your question:

  1. Read in a large amount of CSV files into R
  2. Plot data as a one scatter plot displaying several variables against one variable
  3. Plot data as one scatter plot per CSV file
  4. Save the plots as a PDF or PNG file

And as a bonus you've got code for creating example data which you can use in your future questions. In general, the better the quality of your example data, the better quality answers you'll get (as Thomas suggests in his comment).



来源:https://stackoverflow.com/questions/19852774/how-to-automatically-plot-many-csv-files-with-the-same-number-of-rows-and-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!