Converting text file into data frame in R

旧街凉风 提交于 2019-12-01 11:13:53

We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.

Create test file

Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")

Run. Read in the file using readLines producing L. Then using gsubfn in the gsubfn package insert the character defined by sep between the fields producing g. Finally read the text in g using read.table to create a data frame:

library(gsubfn)
L <- readLines("myfile.txt")

sep <- ";"  # choose any character not in the file

pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)

read.table(text = g, sep = sep)

Output. The result of the last line is:

   V1    V2 V3 V4      V5                 V6   V7   V8   V9  V10
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010

Are you sure about there only being ten columns?

> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
   V1    V2 V3 V4      V5     V6   V7     V8   V9  V10  V11   V12
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930

The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:

require(foreign)
txt2 <- "101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
   V1    V2  V3   V4        V5                   V6   V7   V8   V9   V10
1 101 10.08   S    A   05OCT93  GOLDEN GATE BRIDGE  4110 6548 6404 55930
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!