Create sparse matrix from tweets

痴心易碎 提交于 2019-12-13 07:31:46

问题


I have some tweets and other variables that I would like to convert into a sparse matrix.

This is basically what my data looks like. Right now it is saved in a data.table with one column that contains the tweet and one column that contains the score.

Tweet               Score
Sample Tweet :)        1
Different Tweet        0

I would like to convert this into a matrix that looks like this:

Score Sample Tweet Different :)
    1      1     1         0  1
    0      0     1         1  0

Where there is one row in the sparse matrix for each row in my data.table. Is there an easy way to do this in R?


回答1:


This is close to what you want

library(Matrix)
words = unique(unlist(strsplit(dt[, Tweet], ' ')))

M = Matrix(0, nrow = NROW(dt), ncol = length(words))
colnames(M) = words

for(j in 1:length(words)){
  M[, j] = grepl(paste0('\\b', words[j], '\\b'), dt[, Tweet])
}

M = cbind(M, as.matrix(dt[, setdiff(names(dt),'Tweet'), with=F]))

#2 x 5 sparse Matrix of class "dgCMatrix"
#     Sample Tweet :) Different Score
#[1,]      1     1  .         .     1
#[2,]      .     1  .         1     .

The only small issue is that the regex is not recognising ':)' as a word. Maybe someone who knows regex better can advise how to fix this.



来源:https://stackoverflow.com/questions/41006602/create-sparse-matrix-from-tweets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!