nlp

Understanding DictVectorizer in scikit-learn?

浪子不回头ぞ 提交于 2019-12-12 09:09:59
问题 I'm exploring the different feature extracccion classes that scikit-learn provide. Reading the documentation i did not understand very well for what DictVectorizer can be used?. Other questions come to mind for example how can DictVectorizer can be used for text classification?, i.e. how does this class could help handle labeled textual data?. Could anybody provide some little example apart from the example that i all ready read at the documentation web page? 回答1: say your feature space is

are there any c# libraries for Named Entity Recognition? [closed]

半城伤御伤魂 提交于 2019-12-12 08:48:57
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I am looking for any free libraries for Named Entity Recognition in c# or any other .net language. 回答1: SharpNLP, a port of the Java

Stanford Core NLP how to get the probability & margin of error

夙愿已清 提交于 2019-12-12 08:36:38
问题 When using the parser or for the matter any of the Annotation in Core NLP, is there a way to access the probability or the margin of error? To put my question into context, I am trying to understand if there is a way programmatically to detect a case of ambiguity. For instance in the sentence below the verb desire is detected as a noun. I would like to be able to know so kind of measure I can access or calculate from the Core NLP APi to tell me there could be an ambiguity. (NP (NP (NNP

Can the ANEW dictionary be used for sentiment analysis in quanteda?

醉酒当歌 提交于 2019-12-12 08:16:24
问题 I am trying to find a way to implement the Affective Norms for English Words (in dutch) for a longitudinal sentiment analysis with Quanteda. What I ultimately want to have is a "mean sentiment" per year in order to show any longitudinal trends. In the data-set all words a scored on a 7-point Likert-scale by 64 coders on four categories, which provides a mean for each word. What I want to do is take one of the dimensions and use this to analyse changes in emotions over time. I realise that

How to retrieve all kinds of dates and temporal values from text

浪尽此生 提交于 2019-12-12 08:13:01
问题 I wanted to retrieve dates and other temporal entities from a set of Strings. Can this be done without parsing the string for dates in JAVA as most parsers deal with a limited scope of input patterns. But input is a manual entry which here and hence ambiguous. Inputs can be like: 12th Sep |mid-March |12.September.2013 Sep 12th |12th September| 2013 Sept 13 |12th, September |12th,Feb,2013 I've gone through many answers on finding date in Java but most of them don't deal with such a huge scope

How to use Stanford parser

一个人想着一个人 提交于 2019-12-12 07:57:08
问题 I downloaded the Stanford parser 2.0.5 and use Demo2.java source code that is in the package, but After I compile and run the program it has many errors. A part of my program is: public class testStanfordParser { /** Usage: ParserDemo2 [[grammar] textFile] */ public static void main(String[] args) throws IOException { String grammar = args.length > 0 ? args[0] : "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"; String[] options = { "-maxLength", "80", "-retainTmpSubcategories" };

How to automatically label a cluster of words using semantics?

ⅰ亾dé卋堺 提交于 2019-12-12 07:49:36
问题 The context is : I already have clusters of words (phrases actually) resulting from kmeans applied to internet search queries and using common urls in the results of the search engine as a distance (co-occurrence of urls rather than words if I simplify a lot). I would like to automatically label the clusters using semantics, in other words I'd like to extract the main concept surrounding a group of phrases considered together. For example - sorry for the subject of my example - if I have the

How can I get only heading names.from the text file

爷,独闯天下 提交于 2019-12-12 07:02:29
问题 I have a Text file as below: Education: askdjbnakjfbuisbrkjsbvxcnbvfiuregifuksbkvjb.iasgiufdsegiyvskjdfbsldfgd Technical skills : java,j2ee etc., work done: oaugafiuadgkfjwgeuyrfvskjdfviysdvfhsdf,aviysdvwuyevfahjvshgcsvdfs,bvisdhvfhjsvjdfvshjdvhfjvxjhfvhjsdbvfkjsbdkfg I would like to extract only the heading names such as Education,Technical Skills etc. the code is : with open("aks.txt") as infile, open("fffm",'w') as outfile: copy = False for line in infile: if line.strip() == "Technical

tensorflowVariable RNNLM/RNNLM/embedding/Adam_2/ does not exist

送分小仙女□ 提交于 2019-12-12 06:13:33
问题 My problem is quite similar to tensorflow embeddings don't exist after first RNN example. But I don't think I get a answer. I posted my entire file on https://paste.ubuntu.com/24253170/. But I believe the following code really matter. I get this error message: ValueError: Variable RNNLM/RNNLM/embedding/Adam_2/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? The last line of this code is where this error occured: def test_RNNLM(): config =

How to fuzzily search for a dictionary word?

此生再无相见时 提交于 2019-12-12 05:48:00
问题 I have read a lot of threads here discussing edit-distance based fuzzy-searches, which tools like Elasticsearch/Lucene provide out of the box, but my problem is a bit different. Suppose I have a dictionary of words, {'cat', 'cot', 'catalyst'}, and a character similarity relation f(x, y) f(x, y) = 1, if characters x and y are similar = 0, otherwise (These "similarities" can be specified by the programmer) such that, say, f('t', 'l') = 1 f('a', 'o') = 1 f('f', 't') = 1 but, f('a', 'z') = 0 etc.