Dealing with readLines() function in R

岁酱吖の 提交于 2019-11-29 02:10:47

Suppose txt is the text from line 1 of your data that you read in with readLines.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit, splitting at the space between each word.

> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ"   ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s"))  ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10  
> length(newTxt)
[1] 10

You can firstly condense that code into a single line, the other 3 lines just make objects that you don't need.

line <- readLines("C:/MyFolder/TEXT_TO_BE_PROCESSED.txt")

The if you want to know how many space separated words per line

words <- sapply(line,function(x) length(unlist(strsplit(x,split=" "))))

If you leave out the length argument in the above you get a list of character vectors of the words from each line.

Thys Potgieter

How about:

con <- file(fileName, open='r')
text <- readLines(con)[[1]]

to get the text of the first line of the file.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!