I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. B
To see which documents belong to which topic with the highest probability in topic models, simply use:
topics(lda)
1 2 3 4 5 6 7 8 9 10 11 12
60 41 64 19 94 93 12 64 12 33 59 28
13 14 15 16 17 18 19 20 21 22 23 24
87 19 98 69 61 18 27 18 87 96 44 65
25 26 27 28 29 30 31 32 33 34 35 36
98 77 19 56 76 51 47 38 55 38 92 96
37 38 39 40 41 42 43 44 45 46 47 48
19 19 19 38 79 21 17 21 59 24 49 2
49 50 51 52 53 54 55 56 57 58 59 60
66 65 41 36 68 19 70 50 54 37 27 77
To see the the topics generated from all the documents, simply use:
terms(lda)
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
"quite" "food" "lots" "come" "like"
Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
"ever" "around" "bar" "loved" "new"
I hope this answers your question!
External read that may help: http://www.rtexttools.com/1/post/2011/08/getting-started-with-latent-dirichlet-allocation-using-rtexttools-topicmodels.html
Rachel Shuyan Wang