How to Import a CSV file containing multiple sections into R?

匿名 (未验证) 提交于 2019-12-03 08:44:33

问题:

I want to import the contents of a csv file into R, the csv file contains multiple sections of data vertically, seperated by blank lines and asterisks. For example

******************************************************** * SAMPLE DATA ****************************************** ******************************************************** Name, DOB, Sex Rod, 1/1/1970, M Jane, 5/7/1980, F Freddy, 9.12,1965, M  ******************************************************* *  Income Data **************************************** ******************************************************* Name, Income Rod, 10000 Jane, 15000 Freddy, 7500 

I would like to import this into R as two seperate dataframes. Currently I'm manually cutting the csv file up into smaller files, but I think I could do it using read.csv and the skip and nrows settings of read.csv, If I could work out where the secion breaks are.

This gives me a logical TRUE for every blank line

ifelse(readLines("DATA.csv")=="",TRUE,FALSE) 

I'm hoping someone has already solved this problem.

回答1:

In this case I will do something like:

# Import raw data: data_raw <- readLines("test.txt")  # find separation line: id_sep <- which(data_raw=="")  # create ranges of both data sets: data_1_range <- 4:(id_sep-1) data_2_range <- (id_sep+4):length(data_raw)  # using ranges and row data import it: data_1 <- read.csv(textConnection(data_raw[data_1_range])) data_2 <- read.csv(textConnection(data_raw[data_2_range])) 

Actually your first example set has inconsistent structure so data_1 looks strange.



回答2:

Maybe this untested fragment can be helpful:

reader <- file("DATA.CSV", "r") lines <- readLines(reader) writer1 <- textConnection("csv1", open = "w", local = TRUE) writer2 <- textConnection("csv2", open = "w", local = TRUE) currWriter <- writer1 lastLine <- length(lines) lineNumber <- 4 repeat {     if (lineNumber>lastLine) break     if (lines[lineNumber]=="********************************************************") {         lineNumber <- lineNumber + 2 # eat two lines         currWriter <- writer2     } else {         writeLines(line, currWriter)     }     lineNumber <- lineNumber + 1 } close(reader) close(writer1) close(writer2) csv1Reader <- textConnection(csv1, "r") csv2Reader <- textConnection(csv2, "r") df1 <- read.csv(csv1Reader) df2 <- read.csv(csv2Reader) close(csv1Reader) close(csv2Reader) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!