classification

Why Mallet text classification output the same value 1.0 for all test files?

旧巷老猫 提交于 2019-12-13 03:38:23
问题 I am learning Mallet text classification command lines. The output values for estimating differrent classes are all the same 1.0. I do not know where I am incorrect. Can you help? mallet version: E:\Mallet\mallet-2.0.8RC3 //there is a txt file about cat breed (catmaterial.txt) in cat dir. //command 1 C:\Users\toshiba>mallet import-dir --input E:\Mallet\testmaterial\cat --output E :\Mallet\testmaterial\cat.mallet --remove-stopwords //command 1 output Labels = E:\Mallet\testmaterial\cat /

AWS sagemaker RandomCutForest (RCF) vs scikit lean RandomForest (RF)?

丶灬走出姿态 提交于 2019-12-13 03:10:10
问题 Is there a difference between the two, or are they different names for the same algorithm? 回答1: RandomCutForest (RCF) is an unsupervised method primarily used for anomaly detection, while RandomForest (RF) is a supervised method that can be used for regression or classification. For RCF, see documentation (here) and notebook example (here) 来源: https://stackoverflow.com/questions/56728230/aws-sagemaker-randomcutforest-rcf-vs-scikit-lean-randomforest-rf

R - factor examcard has new levels

只谈情不闲聊 提交于 2019-12-13 02:15:33
问题 I built a classification model in R using C5.0 given below: library(C50) library(caret) a = read.csv("All_SRN.csv") set.seed(123) inTrain <- createDataPartition(a$anatomy, p = .70, list = FALSE) training <- a[ inTrain,] test <- a[-inTrain,] Tree <- C5.0(anatomy ~ ., data = training, trControl = trainControl(method = "repeatedcv", repeats = 10, classProb = TRUE)) TreePred <- predict(Tree, test) The training set has features like - examcard, coil_used, anatomy_region, bodypart_anatomy and

Prefer one class in libsvm (python)

拜拜、爱过 提交于 2019-12-12 22:21:43
问题 I just started playing a bit with libsvm in python and got some simple classification to work. The problem is that I'm constructing a face detection system, and I want a very low false rejection rate . The svm on the other hand seems to optimize for equal false rejection and false acceptance. What options do I have here? And as a said earlier, I'm very new to libsvm, so be kind. ;) 回答1: SVMs are not usually thought of as a probabilistic model, but a maximally-discriminant model. Thus I have a

OSError: cannot identify image file

流过昼夜 提交于 2019-12-12 22:09:03
问题 I am trying impelement code in pytorch but I get bellow error. my python version is 3.6 and my os is linux ubuntu 16.04 lts. I installed my linux alongside of mac os. We will use torchvision and torch.utils.data packages for loading the data.There are 75 validation images for each class. OSError Traceback (most recent call last) <ipython-input-4-e0e3a841f698> in <module>() 62 63 # Get a batch of training data ---> 64 inputs, classes = next(iter(dset_loaders['train'])) 65 66 # Make a grid from

Weka machine learning:how to interprete Naive Bayes classifier?

*爱你&永不变心* 提交于 2019-12-12 18:27:50
问题 I am using the explorer feature for classification. My .arff data file has 10 features of numeric and binary values; (only the ID of instances is nominal).I have abt 16 instances. The class to predict is Yes/No.i have used Naive bayes but i cantnot interpret the results,,does anyone know how to interpret results from naive Bayes classification? 回答1: Naive Bayes doesn't select any important features. As you mentioned, the result of the training of a Naive Bayes classifier is the mean and

SGDClassifier giving different accuracy each time for text classification

我的未来我决定 提交于 2019-12-12 18:17:02
问题 I'm using the SVM Classifier for classifying text as good text and gibberish. I'm using python's scikit-learn and doing it as follows: ''' Created on May 5, 2017 ''' import re import random import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import SGDClassifier from sklearn import metrics # Prepare data def prepare_data(data): """ data is expected to be a list of tuples of category and texts. Returns a tuple of a list of lables and a list

Bayes Network for classification in Matlab (BNT)

只谈情不闲聊 提交于 2019-12-12 17:00:36
问题 this is the deal. So I have created a BN following the instructions from the BNT manual, is the sprinkler one but I have added a node Class for Winter and Summer. Like this: Cloudy------ / \ | Sprinkler Rain | \ / | | Wet Class Where class depends only on wether is cloudy or raining. With the same specification as http://bnt.googlecode.com/svn/trunk/docs/usage.html#basics And the class is also binary, the table is: C R Class prob --------------- 1 1 1 0 2 1 1 0.4 1 2 1 0.4 2 2 1 0.9 etc. So

Problem with CountVectorizer from scikit-learn package

人走茶凉 提交于 2019-12-12 14:11:41
问题 I have a dataset of movie reviews. It has two columns: 'class' and 'reviews' . I have done most of the routine preprocessing stuff, such as: lowering the characters, removing stop words, removing punctuation marks. At the end of preprocessing, each original review looks like words separated by space delimiter. I want to use CountVectorizer and then TF-IDF in order to create features of my dataset so i can do classification/text recognition with Random Forest. I looked into websites and i

sklearn random forest: .oob_score_ too low?

百般思念 提交于 2019-12-12 11:29:35
问题 I was searching for applications for random forests, and I found the following knowledge competition on Kaggle: https://www.kaggle.com/c/forest-cover-type-prediction. Following the advice at https://www.kaggle.com/c/forest-cover-type-prediction/forums/t/8182/first-try-with-random-forests-scikit-learn, I used sklearn to build a random forest with 500 trees. The .oob_score_ was ~2%, but the score on the holdout set was ~75%. There are only seven classes to classify, so 2% is really low. I also