I got a very long telephone log as a text file and I have tried to read it into R but it is not really working out. The text has a structure but it is most certainly not a t
With multi.line = TRUE in the scan
function, a record should end with two end-of-lines. I did this with textConnection around your file, but you would use a valid file name:
inp <- scan(textConnection(txt), multi.line=TRUE,
what=list(place="character", tline1="character",
cline1="character", cline2 ="character", cline3="character"), sep="\n")
Read 5 records
> str(as.data.frame(inp))
'data.frame': 5 obs. of 5 variables:
$ place : Factor w/ 1 level "TheInstitute 5467": 1 1 1 1 1
$ tline1: Factor w/ 2 levels " telephone line 4125526987 x 4567",..: 1 1 2 1 1
$ cline1: Factor w/ 4 levels " bump phone line 4125527777",..: 2 3 1 1 4
$ cline2: Factor w/ 4 levels " blay blay blah who knows what",..: 2 1 3 4 1
$ cline3: Factor w/ 3 levels ""," blay blay blah who knows what",..: 1 1 2 3 1
> as.data.frame(inp)
place tline1
1 TheInstitute 5467 telephone line 4125526987 x 4567
2 TheInstitute 5467 telephone line 4125526987 x 4567
3 TheInstitute 5467 telephone line 412552999 x 4999
4 TheInstitute 5467 telephone line 4125526987 x 4567
5 TheInstitute 5467 telephone line 4125526987 x 4567
cline1
1 datetime 2011110516 12:56
2 datetime 2011110516 12:58
3 bump phone line 4125527777
4 bump phone line 4125527777
5 datetime 2011110516 14:56
cline2
1 blay blay blah who knows what, but anyway it may have a comma
2 blay blay blah who knows what
3 datetime 2011110516 12:59
4 datetime 2011110516 13:51
5 blay blay blah who knows what
cline3
1
2
3 blay blay blah who knows what
4 blay blay blah who knows what, but anyway it may have a comma
5