nlp | 易学教程

Pointwise mutual information on text

阅读更多关于 Pointwise mutual information on text

问题 I was wondering how one would calculate the pointwise mutual information for text classification. To be more exact, I want to classify tweets in categories. I have a dataset of tweets (which are annotated), and I have a dictionary per category of words which belong to that category. Given this information, how is it possible to calculate the PMI for each category per tweet, to classify a tweet in one of these categories. 回答1: PMI is a measure of association between a feature (in your case a

Python module with access to english dictionaries including definitions of words [closed]

阅读更多关于 Python module with access to english dictionaries including definitions of words [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am looking for a python module that helps me get the definition(s) from an english dictionary for a word. There is of course enchant , which helps me check if the word exists in the English language, but it does not provide definitions of them (at least I don't see anything like that in the docs) There is also

Is it possible to train Stanford NER system to recognize more named entities types?

阅读更多关于 Is it possible to train Stanford NER system to recognize more named entities types?

问题 I'm using some NLP libraries now, (stanford and nltk) Stanford I saw the demo part but just want to ask if it possible to use it to identify more entity types. So currently stanford NER system (as the demo shows) can recognize entities as person(name), organization or location. But the organizations recognized are limited to universities or some, big organizations. I'm wondering if I can use its API to write program for more entity types, like if my input is "Apple" or "Square" it can

Training n-gram NER with Stanford NLP

阅读更多关于 Training n-gram NER with Stanford NLP

问题 Recently I have been trying to train n-gram entities with Stanford Core NLP. I have followed the following tutorials - http://nlp.stanford.edu/software/crf-faq.shtml#b With this, I am able to specify only unigram tokens and the class it belongs to. Can any one guide me through so that I can extend it to n-grams. I am trying to extract known entities like movie names from chat data set. Please guide me through in case I have mis-interpretted the Stanford Tutorials and the same can be used for

Am I passing the string correctly to the python library?

阅读更多关于 Am I passing the string correctly to the python library?

问题 I'm using a python library called Guess Language: http://pypi.python.org/pypi/guess-language/0.1 "justwords" is a string with unicode text. I stick it in the package, but it always returns English, even though the web page is in Japanese. Does anyone know why? Am I not encoding correctly? §ç©ºéå ¶ä»æ¡å°±æ²æéç¨®å¾ é¤ï¼æä»¥ä¾éè£¡ç¶ç éäºï¼åæ¤ç°å¢æ°£æ°¹³åèµ·ä¾åªè½ç®âå¾å¥½âéå¸¸å¥½âåå ¶æ¯è¦é»é¤ï¼é¨ä¾¿é»çé»ãé£²æãä¸ææ²»çåä¸å ä¾¿å®ï¼æ¯æ´è¥ç äºï¼æ³æ³éè£¡ä»¥å°é»ãæ¯è§ä¾èªªä¹è©²æpremiumï¼åªæ±é¤é»å¥½åå°

how to build json array dynamically in javascript

阅读更多关于 how to build json array dynamically in javascript

问题 I receive a json object with some number of quick reply elements from wit.ai, like this: "msg": "So glad to have you back. What do you want me to do? "action_id": "6fd7f2bd-db67-46d2-8742-ec160d9261c1", "confidence": 0.08098269709064443, "quickreplies": [ "News?", "Subscribe?", "Contribute?", "Organize?" ], "type": "msg" I then need to convert them to a slightly different format as they are passed to FaceBook Messenger as described in the code below. Wit only exposes 'msg' and 'quickreplies.'

Check perplexity of a Language Model

阅读更多关于 Check perplexity of a Language Model

问题 I created a language model with Keras LSTM and now I want to assess wether it's good so I want to calculate perplexity. What is the best way to calc perplexity of a model in Python? 回答1: I've come up with two versions and attached their corresponding source, please feel free to check the links out. def perplexity_raw(y_true, y_pred): """ The perplexity metric. Why isn't this part of Keras yet?! https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow https

define CRF++ template file

阅读更多关于 define CRF++ template file

问题 This is my issue, but it doesn't say HOW to define the template file correctly. My training file looks like this: 上 B-NR 海 L-NR 浦 B-NR 东 L-NR 开 B-NN 发 L-NN 与 U-CC 法 B-NN 制 L-NN 建 B-NN ... 回答1: CRF++ is extremely easy to use. The instructions on the website explains it clearly. http://crfpp.googlecode.com/svn/trunk/doc/index.html Suppose we extract feature for the line 东 L-NR Unigram U02:%x[0,0] #means column 0 of the current line U03:%x[1,0] #means column 0 of the next line So the underlying

gensim doc2vec “intersect_word2vec_format” command

阅读更多关于 gensim doc2vec “intersect_word2vec_format” command

问题 Just reading through the doc2vec commands on the gensim page. I am curious about the command"intersect_word2vec_format" . My understanding of this command is it lets me inject vector values from a pretrained word2vec model into my doc2vec model and then train my doc2vec model using the pretrained word2vec values rather than generating the word vector values from my document corpus. The result is that I get a more accurate doc2vec model because I am using pretrained w2v values which was

How can we extract the main verb from a sentence?

阅读更多关于 How can we extract the main verb from a sentence?

问题 For example, "parrots do not swim." Here the main verb is "swim". How can we extract that by language processing? Are there any known algorithms for this purpose? 回答1: You can run a dependency parsing algorithm on the sentence and the find the dependent of the root relation. For example, running the sentence "Parrots do not swim" through the Stanford Parser online demo, I get the following dependencies: nsubj(swim-4, Parrots-1) aux(swim-4, do-2) neg(swim-4, not-3) root(ROOT-0, swim-4) Each of