List of part of speech tags per sentence with POS Tagger Stanford NPL in C#

坚强是说给别人听的谎言 提交于 2019-12-24 15:08:21

问题


Using the POS Tagger of Stanford NPL .NET, I'm trying to extract a detailed list of part of speech tags per sentence.

e.g: "Have a look over there. Look at the car!"

Have/VB a/DT look/NN over/IN there/RB ./. Look/VB at/IN the/DT car/NN !/.

I need:

  • POS Text: "Have"
  • POS tag: "VB"
  • Position in the original text

I managed to achieve this by accessing the private fields of the result via reflection.

I know it's ugly, not efficient and very bad, but that's the only I found until know. Hence my question; is there any built-in way to access such information?

using (var streamReader = new StringReader(rawText))
{
    var tokenizedSentences = MaxentTagger.tokenizeText(streamReader).toArray();

    foreach (ArrayList tokenizedSentence in tokenizedSentences)
    {
        var taggedSentence = _posTagger.tagSentence(tokenizedSentence).toArray();

        for (int index = 0; index < taggedSentence.Length; index++)
        {
            var partOfSpeech = ((StringLabel) (taggedSentence[index]));
            var posText = partOfSpeech.value();

            var posTag = ReflectionHelper.GetInstanceField(typeof (TaggedWord), partOfSpeech, "tag") as string;
            var posBeginPosition = (int)ReflectionHelper.GetInstanceField(typeof (StringLabel), partOfSpeech, "beginPosition");
            var posEndPosition = (int)ReflectionHelper.GetInstanceField(typeof (StringLabel), partOfSpeech, "endPosition");

            // process the pos
        }
    } 

ReflectionHelper:

public static object GetInstanceField<T>(T instance, string fieldName)
{
    const BindingFlags bindFlags = BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static;

    object result = null;
    var field = typeof(T).GetField(fieldName, bindFlags);
    if (field != null)
    {
        result = field.GetValue(instance);
    }
    return result;
}

回答1:


The solution is pretty easy. Just cast the part of speech (taggedSentence[index]) to a TaggedWord. You can then easily access these properties from the getters beginPosition(), endPosition(), tag() and value().



来源:https://stackoverflow.com/questions/29728347/list-of-part-of-speech-tags-per-sentence-with-pos-tagger-stanford-npl-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!