document-classification

Get WordNet's domain name for the specified word

穿精又带淫゛_ 提交于 2019-11-30 07:41:04
问题 I know WordNet has Domains Hierarchy: e.g. sport->football. 1) Is it possible to list all words related, for example, to the 'sport->football' sub-domain? Response: goalkeeper, forward, penalty, ball, field, stadium, referee and so on. 2) Get domain's name for a given word , e.g. 'goalkeeper'? Need something like [sport->football; sport->hockey] or [football;hockey] or just 'football'. It is for a document classification task. 回答1: WordNet has a hypernym / hyponym hierarchy but that is not

How do you initialize a gensim corpus variable with a csr_matrix?

大城市里の小女人 提交于 2019-11-30 07:27:42
I have X as a csr_matrix that I obtained using scikit's tfidf vectorizer, and y which is an array My plan is to create features using LDA, however, I failed to find how to initialize a gensim's corpus variable with X as a csr_matrix. In other words, I don't want to download a corpus as shown in gensim's documentation nor convert X to a dense matrix, since it would consume a lot of memory and the computer could hang. In short, my questions are the following, How do you initialize a gensim corpus given that I have a csr_matrix (sparse) representing the whole corpus? How do you use LDA to extract

How do you initialize a gensim corpus variable with a csr_matrix?

时间秒杀一切 提交于 2019-11-29 09:40:26
问题 I have X as a csr_matrix that I obtained using scikit's tfidf vectorizer, and y which is an array My plan is to create features using LDA, however, I failed to find how to initialize a gensim's corpus variable with X as a csr_matrix. In other words, I don't want to download a corpus as shown in gensim's documentation nor convert X to a dense matrix, since it would consume a lot of memory and the computer could hang. In short, my questions are the following, How do you initialize a gensim

Get WordNet's domain name for the specified word

╄→гoц情女王★ 提交于 2019-11-29 05:20:51
I know WordNet has Domains Hierarchy: e.g. sport->football. 1) Is it possible to list all words related, for example, to the 'sport->football' sub-domain? Response: goalkeeper, forward, penalty, ball, field, stadium, referee and so on. 2) Get domain's name for a given word , e.g. 'goalkeeper'? Need something like [sport->football; sport->hockey] or [football;hockey] or just 'football'. It is for a document classification task. WordNet has a hypernym / hyponym hierarchy but that is not what you want here, as you can see when you look up goalkeeper : from nltk.corpus import wordnet s = wordnet

Different results between the Bernoulli Naive Bayes in NLTK and in scikit-learn

这一生的挚爱 提交于 2019-11-28 10:28:29
问题 I am getting quite different results when classifying text (in only two categories) with the Bernoulli Naive Bayes algorithm in NLTK and the one in scikit-learn module. Although the overall accuracy is comparable between the two (although far from identical) the difference in Type I and Type II errors is significant. In particular, the NLTK Naive Bayes classifier would give more Type I than Type II errors , while the scikit-learn -- the opposite. This 'anomaly' seem to be consistent across

What tried and true algorithms for suggesting related articles are out there?

爱⌒轻易说出口 提交于 2019-11-28 03:04:48
Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related. Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, including the title and author name. How do you go about finding the possibly related documents? I'm rather interested in the actual algorithm, not ready solutions, although I'd be ok with taking a look at something implemented in ruby or python, or relying on mysql or pgsql

What tried and true algorithms for suggesting related articles are out there?

和自甴很熟 提交于 2019-11-26 23:56:01
问题 Pretty common situation, I'd wager. You have a blog or news site and you have plenty of articles or blags or whatever you call them, and you want to, at the bottom of each, suggest others that seem to be related. Let's assume very little metadata about each item. That is, no tags, categories. Treat as one big blob of text, including the title and author name. How do you go about finding the possibly related documents? I'm rather interested in the actual algorithm, not ready solutions,