问题
I want to use OpenNLP in order to tokenize Thai words. I downloaded OpenNLP and Thai tokenize model and run the following
./bin/opennlp POSTagger -lang th -model thai.tok.bin < sentence.txt > output.txt
I put thai.tok.bin
that I downloaded on the directory that I call from and run the following. sentence.txt
has this text inside กินอะไรยังนาย
. However, the output I got has only these text:
Usage: opennlp POSTagger model < sentences
Execution time: 0.000 seconds
I'm pretty new to OpenNLP
, please let me know if anyone knows how to get output from it.
回答1:
The models from your link are outdated. First you need some manual steps to convert the model.
- Download the file thai.tok.bin.gz and extract to an empty folder. Rename the extracted file
thai.tok.bin
totoken.model
In the same folder, create a file named
manifest.properties
with the following contents:Manifest-Version=1.0. Language=th OpenNLP-Version=1.5.0 Component-Name=TokenizerME useAlphaNumericOptimization=false
Now you can zip the files, if you are using Linux you can use this command:
zip thai.tok.bin token.model manifest.properties
Try your model:
sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin < thai_sentence.txt Loading Tokenizer model ... done (0,097s) กินอะไร ยังนาย Average: 333,3 sent/s Total: 1 sent Runtime: 0.003s Execution time: 0,108 seconds
Now that you have the updated tokenizer, you can do similar with the POS Tagger model.
Download the file thai.tag.bin.gz and extract to a empty folder. Rename the extracted file
thai.tag.bin
topos.model
In the same folder, create a file named
manifest.properties
with the following contents:Manifest-Version=1.0 Language=th OpenNLP-Version=1.5.0 Component-Name=POSTaggerME
Now you can zip the files, if you are using Linux you can use this command:
zip thai.pos.bin pos.model manifest.properties
Finally, we can try the two models combined:
sh bin/opennlp TokenizerME ~/Downloads/thai-token.bin/thai.tok.bin < thai_sentence.txt > thai_tokens.txt
sh bin/opennlp POSTagger ~/Downloads/pt-pos-maxent/thai.pos.bin < thai_tokens.txt
The result is:
กินอะไร_VACT ยังนาย_NCMN
Please, let me know if this is the expected result.
来源:https://stackoverflow.com/questions/43685885/opennlp-postagger-output-from-command-line