Converting text file into data frame in R

白昼怎懂夜的黑 提交于 2019-12-01 07:34:09

问题


My raw data is in a text file with no particular delimiters between the values, like so:

101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930

Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?


回答1:


We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.

Create test file

Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")

Run. Read in the file using readLines producing L. Then using gsubfn in the gsubfn package insert the character defined by sep between the fields producing g. Finally read the text in g using read.table to create a data frame:

library(gsubfn)
L <- readLines("myfile.txt")

sep <- ";"  # choose any character not in the file

pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)

read.table(text = g, sep = sep)

Output. The result of the last line is:

   V1    V2 V3 V4      V5                 V6   V7   V8   V9  V10
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010



回答2:


Are you sure about there only being ten columns?

> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
   V1    V2 V3 V4      V5     V6   V7     V8   V9  V10  V11   V12
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930



回答3:


The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:

require(foreign)
txt2 <- "101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
   V1    V2  V3   V4        V5                   V6   V7   V8   V9   V10
1 101 10.08   S    A   05OCT93  GOLDEN GATE BRIDGE  4110 6548 6404 55930


来源:https://stackoverflow.com/questions/20297564/converting-text-file-into-data-frame-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!