Run cvb in mahout 0.8

≡放荡痞女 提交于 2019-12-03 01:48:23

So here are the subsequent Mahout commands I had to call in a linux shell to do it. $MAHOUT_HOME points to my mahout/bin folder.

$MAHOUT_HOME/mahout seqdirectory \
    -i path/to/directory/with/texts \
    -o out/sequenced

$MAHOUT_HOME/mahout seq2sparse -i out/sequenced \
    -o out/sparseVectors \
    --namedVector \
    -wt tf

$MAHOUT_HOME/mahout rowid \
    -i out/sparseVectors/tf-vectors/ \
    -o out/matrix

$MAHOUT_HOME/mahout cvb0_local \
    -i out/matrix/matrix \
    -d out/sparseVectors/dictionary.file-0 \
    -a 0.5 \
    -top 4 -do out/cvb/do_out \
    -to out/cvb/to_out

Inspect the output by showing the top 10 words of each topic:

$MAHOUT_HOME/mahout vectordump \
    -i out/cvb/to_out \
    --dictionary out/sparseVectors/dictionary.file-0 \
    --dictionaryType sequencefile \
    --vectorSize 10 \
    -sort out/cvb/to_out

Thanks to JoKnopp for the detail commands.

If you get: Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String

you need to add the command line option "maxIterations": --maxIterations (-m) maxIterations

I use -m 20 and it works

refer to: https://issues.apache.org/jira/browse/MAHOUT-1141

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!