问题
My raw data is in a text file with no particular delimiters between the values, like so:
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?
回答1:
We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.
Create test file
Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")
Run. Read in the file using readLines producing L. Then using gsubfn in the gsubfn package insert the character defined by sep between the fields producing g.
Finally read the text in g using read.table to create a data frame:
library(gsubfn)
L <- readLines("myfile.txt")
sep <- ";" # choose any character not in the file
pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)
read.table(text = g, sep = sep)
Output. The result of the last line is:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
回答2:
Are you sure about there only being ten columns?
> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
回答3:
The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:
require(foreign)
txt2 <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
来源:https://stackoverflow.com/questions/20297564/converting-text-file-into-data-frame-in-r