问题
I am making a "Part of speech Tagger". I am handling the unknown word with the suffix.
But the main issue is that how would i decide the number of suffix... should it be pre-decided (like Weischedel approach) or I have to take the last few alphabets of the words(like Samuelsson approach).
Which approach would be better......
回答1:
Quick googling suggests that the Weischedel approach is sufficient for English, which has only rudimentary morphological inflection. The Samuelsson approach seems to work better (which makes sense intuitively) when it comes to processing inflecting languages.
A Resource-light Approach to Morpho-syntactic Tagging - Google Books p 9 quote:
To handle unknown words Brants (2000) uses Samuelsson's (1993) suffix analysis, which seems to work best for inflected languages.
(This is not in a direct comparison to Weischedel's approach, though.)
来源:https://stackoverflow.com/questions/25310485/how-to-take-the-suffix-in-smoothing-of-part-of-speech-tagging