问题
I have some tweets and other variables that I would like to convert into a sparse matrix.
This is basically what my data looks like. Right now it is saved in a data.table with one column that contains the tweet and one column that contains the score.
Tweet Score
Sample Tweet :) 1
Different Tweet 0
I would like to convert this into a matrix that looks like this:
Score Sample Tweet Different :)
1 1 1 0 1
0 0 1 1 0
Where there is one row in the sparse matrix for each row in my data.table. Is there an easy way to do this in R?
回答1:
This is close to what you want
library(Matrix)
words = unique(unlist(strsplit(dt[, Tweet], ' ')))
M = Matrix(0, nrow = NROW(dt), ncol = length(words))
colnames(M) = words
for(j in 1:length(words)){
M[, j] = grepl(paste0('\\b', words[j], '\\b'), dt[, Tweet])
}
M = cbind(M, as.matrix(dt[, setdiff(names(dt),'Tweet'), with=F]))
#2 x 5 sparse Matrix of class "dgCMatrix"
# Sample Tweet :) Different Score
#[1,] 1 1 . . 1
#[2,] . 1 . 1 .
The only small issue is that the regex is not recognising ':)'
as a word. Maybe someone who knows regex better can advise how to fix this.
来源:https://stackoverflow.com/questions/41006602/create-sparse-matrix-from-tweets