How to find the “lexical file” in Wordnet?

扶醉桌前 提交于 2019-12-22 07:01:35

问题


If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:

   <noun.substance>S: (n) filling, fill (any material that fills a space or container)
   <noun.process>S: (n) filling (flow into something (as a container))
   <noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
   <noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
   <noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
   <noun.act>S: (n) filling (the act of filling something) 

The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info

  • The latest RDF translation of Wordnet 3.0 points to two things:

  • Talis SPARQL endpoint. Use eg this query to check there's no such info:

    DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>

  • W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic. But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"

    DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->

    <j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>

The question: is there a public Wordnet query API, or a database, that provides the lexical file information?


回答1:


It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic




回答2:


I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:

00      adj.all 3
01      adj.pert        3 
02      adv.all 4
03      noun.Tops       1  
04      noun.act        1
05      noun.animal     1
06      noun.artifact   1
07      noun.attribute  1
08      noun.body       1
09      noun.cognition  1
10      noun.communication      1
11      noun.event      1
12      noun.feeling    1
13      noun.food       1
14      noun.group      1
15      noun.location   1
16      noun.motive     1
17      noun.object     1
18      noun.person     1
19      noun.phenomenon 1
20      noun.plant      1
21      noun.possession 1
22      noun.process    1
23      noun.quantity   1
24      noun.relation   1
25      noun.shape      1
26      noun.state      1
27      noun.substance  1
28      noun.time       1
29      verb.body       2
30      verb.change     2
31      verb.cognition  2
32      verb.communication      2
33      verb.competition        2
34      verb.consumption        2
35      verb.contact    2
36      verb.creation   2
37      verb.emotion    2
38      verb.motion     2
39      verb.perception 2
40      verb.possession 2
41      verb.social     2
42      verb.stative    2
43      verb.weather    2
44      adj.ppl 3

For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.

07883031 13 n 01 filling 0 002 @ 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.



回答3:


Using the Python NLTK interface:

from nltk.corpus import wordnet as wn

for synset in wn.synsets('can'):
    print  synset.lexname



回答4:


This is what worked for me,

Synset[] synsets = database.getSynsets(wordStr);

ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];

int lexicalCode =referenceSynset.getLexicalFileNumber();

Then use above table to deduce "lexnames" e.g. noun.time




回答5:


If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%

Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like: C:\Users\yourname\AppData\Roaming\nltk_data\corpora

and lexnames will present under C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.



来源:https://stackoverflow.com/questions/6681348/how-to-find-the-lexical-file-in-wordnet

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!