问题
If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful:
wn:classifiedByTopic
. But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1>
-><j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?
回答1:
It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic
回答2:
I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames
. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 @ 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.
回答3:
Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname
回答4:
This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time
回答5:
If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like: C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.
来源:https://stackoverflow.com/questions/6681348/how-to-find-the-lexical-file-in-wordnet