I want to import the contents of a csv file into R, the csv file contains multiple sections of data vertically, seperated by blank lines and asterisks. For example
******************************************************** * SAMPLE DATA ****************************************** ******************************************************** Name, DOB, Sex Rod, 1/1/1970, M Jane, 5/7/1980, F Freddy, 9.12,1965, M ******************************************************* * Income Data **************************************** ******************************************************* Name, Income Rod, 10000 Jane, 15000 Freddy, 7500
I would like to import this into R as two seperate dataframes. Currently I'm manually cutting the csv file up into smaller files, but I think I could do it using read.csv and the skip and nrows settings of read.csv, If I could work out where the secion breaks are.
This gives me a logical TRUE for every blank line
ifelse(readLines("DATA.csv")=="",TRUE,FALSE)
I'm hoping someone has already solved this problem.
In this case I will do something like:
# Import raw data: data_raw <- readLines("test.txt") # find separation line: id_sep <- which(data_raw=="") # create ranges of both data sets: data_1_range <- 4:(id_sep-1) data_2_range <- (id_sep+4):length(data_raw) # using ranges and row data import it: data_1 <- read.csv(textConnection(data_raw[data_1_range])) data_2 <- read.csv(textConnection(data_raw[data_2_range]))
Actually your first example set has inconsistent structure so data_1
looks strange.
Maybe this untested fragment can be helpful:
reader <- file("DATA.CSV", "r") lines <- readLines(reader) writer1 <- textConnection("csv1", open = "w", local = TRUE) writer2 <- textConnection("csv2", open = "w", local = TRUE) currWriter <- writer1 lastLine <- length(lines) lineNumber <- 4 repeat { if (lineNumber>lastLine) break if (lines[lineNumber]=="********************************************************") { lineNumber <- lineNumber + 2 # eat two lines currWriter <- writer2 } else { writeLines(line, currWriter) } lineNumber <- lineNumber + 1 } close(reader) close(writer1) close(writer2) csv1Reader <- textConnection(csv1, "r") csv2Reader <- textConnection(csv2, "r") df1 <- read.csv(csv1Reader) df2 <- read.csv(csv2Reader) close(csv1Reader) close(csv2Reader)