nlp | 易学教程

Convert nl string to vector or some numeric equivalent

阅读更多关于 Convert nl string to vector or some numeric equivalent

问题 I'm trying to convert a string to a numeric equivalent so I can train a neural-network to classify the strings. I tried the sum of the ascii values, but that just results in larger numbers vs smaller numbers. For example, I could have a short string in german and it puts it into the english class because the english words that it has been trained with are short and numerically small. I was looking into Google's word2vec, which seems like it should work. But I want to do this on the client

I'm getting the following Import Error when importing the sutime module - what does it mean?

阅读更多关于 I'm getting the following Import Error when importing the sutime module - what does it mean?

问题 I'm getting this error: ImportError: cannot import name 'SUTime' from partially initialized module 'sutime' (most likely due to a circular import) when importing the sutime module as: from sutime import SUTime as suggested in the sutime GitHub example: https://github.com/FraBle/python-sutime Context: sutime is a Python library for parsing date/time from a natural language input, developed by the amazing team over at Stanford CoreNLP. Note: I've already run the pre-req installs as well: >> pip

Recognize sentence structure in Prolog

阅读更多关于 Recognize sentence structure in Prolog

问题 I have a situation where I'm reading in three different types of sentences. They will be of one of the following forms: _ is a _ A _ is a _ Is _ a _? I need to be able to recognize which type of sentence was entered and then add to or query my knowledge base. For example, the user may input: Fido is a dog. I would then add that fact to my knowledge base. The user could then enter: Is Fido a dog? And the program would answer yes. So far my only idea of recognizing the facts is splitting the

TypeError: expected string or bytes-like object HashingVectorizer

阅读更多关于 TypeError: expected string or bytes-like object HashingVectorizer

问题 I have been facing this issue while fitting the dataset..Everything seems fine, don't know where the problem is. Since I'm a beginner could anyone please tell me what I am doing wrong or am I missing something? The problem seems to be in data preprocessing part Error trace and the dataframe's head has been attached as image below ` train = pd.read_csv('train.txt', sep='\t', dtype=str, header=None) test = pd.read_csv('test.txt', sep='\t', dtype=str, header=None) X_train = train.iloc[:,1:] y

How to understand this formula in Lingpipe language model?

阅读更多关于 How to understand this formula in Lingpipe language model?

问题 This is from manual of Lingpipe doc in building a language model. But I only partly understand the theory behind it. I especially do not know the base probability. Here, how to get base p(d). If below is portion of token and their freq in unigram file. ab 20 aba 3 abd 2 abef 2 abkk 3 Under such condition, what is lamda(),1-lamda(), extcount, numExtentions and Base P(ab)? This is one question but they are chained. Thanks a lot. 来源： https://stackoverflow.com/questions/10797878/how-to-understand

How to understand this formula in Lingpipe language model?

阅读更多关于 How to understand this formula in Lingpipe language model?

How to extract tag attributes using Spacy

阅读更多关于 How to extract tag attributes using Spacy

问题 I tried to get the morphological attributes of the verb using Spacy like below: import spacy from spacy.lang.it.examples import sentences nlp = spacy.load('it_core_news_sm') doc = nlp('Ti è piaciuto il film?') token = doc[2] nlp.vocab.morphology.tag_map[token.tag_] output was: {'pos': 'VERB'} But I want to extract V__Mood=Cnd|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin": {POS: VERB} Is it possible to extract the mood, tense,number,person information as specified in the tag-map https://github

Google Natural Language Sentiment Analysis Aggregate Scores

阅读更多关于 Google Natural Language Sentiment Analysis Aggregate Scores

问题 In this part of the documentation of the Google Cloud Platform Natural Language API, it is described that The overall score and magnitude values for an entity are an aggregate of the specific score and magnitude values for each mention of the entity. I can't figure out how this aggregation works. In the example provided in the documentation, Marvin Gaye has two mentions. One of the mentions has a sentiment of 0.4 and a magnitude of 0.4, the other mention has a score of -0.2 and a magnitude 0

Google Natural Language Sentiment Analysis Aggregate Scores

阅读更多关于 Google Natural Language Sentiment Analysis Aggregate Scores

Calculate perplexity of word2vec model

阅读更多关于 Calculate perplexity of word2vec model

问题 I trained Gensim W2V model on 500K sentences (around 60K) words and I want to calculate the perplexity. What will be the best way to do so? for 60K words, how can I check what will be a proper amount of data? Thanks 回答1: If you want to calculate the perplexity, you have first to retrieve the loss. On the gensim.models.word2vec.Word2Vec constructor, pass the compute_loss=True parameter - this way, gensim will store the loss for you while training. Once trained, you can call the get_latest