spaCy tags up each of the Token
s in a Document
with a part of speech (in two different formats, one stored in the pos
and pos_
Just expand the lists at:
The docs have greatly improved since I first asked this question, and spaCy now documents this much better.
The pos
and tag
attributes are tabulated at https://spacy.io/api/annotation#pos-tagging, and the origin of those lists of values is described. At the time of this (January 2020) edit, the docs say of the pos
attribute that:
spaCy maps all language-specific part-of-speech tags to a small, fixed set of word type tags following the Universal Dependencies scheme. The universal tags don’t code for any morphological features and only cover the word type. They’re available as the Token.pos and Token.pos_ attributes.
As for the tag
attribute, the docs say:
The English part-of-speech tagger uses the OntoNotes 5 version of the Penn Treebank tag set. We also map the tags to the simpler Universal Dependencies v2 POS tag set.
and
The German part-of-speech tagger uses the TIGER Treebank annotation scheme. We also map the tags to the simpler Universal Dependencies v2 POS tag set.
You thus have a choice between using a coarse-grained tag set that is consistent across languages (.pos
), or a fine-grained tag set (.tag
) that is specific to a particular treebank, and hence a particular language.
.pos_
tag listThe docs list the following coarse-grained tags used for the pos
and pos_
attributes:
ADJ
: adjective, e.g. big, old, green, incomprehensible, firstADP
: adposition, e.g. in, to, duringADV
: adverb, e.g. very, tomorrow, down, where, thereAUX
: auxiliary, e.g. is, has (done), will (do), should (do)CONJ
: conjunction, e.g. and, or, butCCONJ
: coordinating conjunction, e.g. and, or, butDET
: determiner, e.g. a, an, theINTJ
: interjection, e.g. psst, ouch, bravo, helloNOUN
: noun, e.g. girl, cat, tree, air, beautyNUM
: numeral, e.g. 1, 2017, one, seventy-seven, IV, MMXIVPART
: particle, e.g. ’s, not,PRON
: pronoun, e.g I, you, he, she, myself, themselves, somebodyPROPN
: proper noun, e.g. Mary, John, London, NATO, HBOPUNCT
: punctuation, e.g. ., (, ), ?SCONJ
: subordinating conjunction, e.g. if, while, thatSYM
: symbol, e.g. $, %, §, ©, +, −, ×, ÷, =, :),