I'm experiencing a very hard time with R lately.
I'm not an expert user but I'm trying to use R to read a plain text (.txt
) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.
Here is the code I'm using:
fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)
It reads the text and the line breaks perfectly. But I don't understand how the created object line
works.
The object line
created with this code has the class: character
and the length [57]
.
If I type line[1]
it shows exactly the text of the first line. But if I type
length(line[1])
it returns me [1]
.
I would like to know how can I transform this string of length == 1
that contains 518 in fact into a string of length == 518
.
Does anyone know what I'm doing wrong?
I don't need to necessarily use the readLines()
function. I've did some research and also found the function scan()
, but I ended with the same situation of a immutable string of 518 characters but length == 1
.
Hope I've been clear enough about my doubt. Sorry for the bad English.
Suppose txt
is the text from line 1 of your data that you read in with readLines
.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit
, splitting at the space between each word.
> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ" ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s")) ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10
> length(newTxt)
[1] 10
You can firstly condense that code into a single line, the other 3 lines just make objects that you don't need.
line <- readLines("C:/MyFolder/TEXT_TO_BE_PROCESSED.txt")
The if you want to know how many space separated words per line
words <- sapply(line,function(x) length(unlist(strsplit(x,split=" "))))
If you leave out the length
argument in the above you get a list of character vectors of the words from each line.
How about:
con <- file(fileName, open='r')
text <- readLines(con)[[1]]
to get the text of the first line of the file.
来源:https://stackoverflow.com/questions/23001548/dealing-with-readlines-function-in-r