I read the text file with below data and am trying to convert it to a dataframe
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group
Here is a different approach using separate_rows and spread to reformat the text file into a dataframe:
text = readLines(path_to_textfile)
library(dplyr)
library(tidyr)
data.frame(text = text) %>%
separate_rows(text, sep = "(?<=\\d)\\s+(?=[a-z])") %>%
extract(text, c("title", "value"), regex = "(?i)([a-z]+):(.+)") %>%
filter(!title %in% c("reviews", "downloaded")) %>%
group_by(title) %>%
mutate(id = 1:n()) %>%
spread(title, value) %>%
select(-id)
Result:
ASIN group Id rating salesrank
1 0827229534 Book 1 5 396585
2 12412441 Book 2 10 4225352
similar
1 5 0804215715 156101074X 0687023955 0687074231 082721619X
2 1241242 1412414 124124
title
1 Patterns of Preaching: A Sermon Sampler
2 Patterns2
Data:
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
reviews: total: 2 downloaded: 2 avg rating: 5
Id: 2
ASIN: 12412441
title: Patterns2
group: Book
salesrank: 4225352
similar: 1241242 1412414 124124
reviews: total: 2 downloaded: 2 avg rating: 10
Note:
Leave an extra blank row at the end of the text file. Otherwise readLines would return an error when attempting to read in the file.