classification

Scikit - changing the threshold to create multiple confusion matrixes

旧街凉风 提交于 2019-12-12 07:52:53
问题 I'm building a classifier that goes through lending club data, and selects the best X loans. I've trained a Random Forest, and created the usual ROC curves, Confusion Matrices, etc. The confusion matrix takes as an argument the predictions of the classifier (the majority prediction of the trees in the forest). However, I wish to print multiple confusion matrices at different thresholds, to know what happens if I choose the 10% best loans, the 20% best loans, etc. I know from reading other

How can i get highest frequency terms out of TD-idf vectors , for each files in scikit-learn?

房东的猫 提交于 2019-12-12 07:39:31
问题 I am trying to get Highest frequency terms out of vectors in scikit-learn. From example It can be done using this for each Categories but i want it for each files inside categories. https://github.com/scikit-learn/scikit-learn/blob/master/examples/document_classification_20newsgroups.py if opts.print_top10: print "top 10 keywords per class:" for i, category in enumerate(categories): top10 = np.argsort(clf.coef_[i])[-10:] print trim("%s: %s" % ( category, " ".join(feature_names[top10]))) I

C# ENCOG SVM classification with my own dataset

只愿长相守 提交于 2019-12-12 06:38:26
问题 I would like to do a multiclass classification application in C#. I decided to use encog to do so. Now I am stuck at one point. I found a XOR example, which I understand. But when I am going to use my own dataset, app is computing only with one feature from one example. Here is my code: namespace ConsoleApplication1 { public static class Load { public static double[][] FromFile(string path) { var rows = new List<double[]>(); foreach (var line in File.ReadAllLines(path)) { rows.Add(line.Split

SVM prediction does not predict OK although the support vectors are valid

限于喜欢 提交于 2019-12-12 05:48:56
问题 I have a following( fig 1 ) unlabeled training set which I am trying to detect the outliers, have come up with a procedure to label the data with 0:normal data and 1:outlier and want to train it with SVM. I followed this instructions to train the SVM's model but when I am trying to predict the labels of same data I have trained the SVM it does not predict any( fig 2 )! fig 1: the support vectors after training fig 2: the prediction of SVM model on the same data it has been training with The

How to do text classification with label probabilities?

試著忘記壹切 提交于 2019-12-12 04:48:03
问题 I'm trying to solve a text classification problem for academic purpose. I need to classify the tweets into labels like "cloud" ,"cold", "dry", "hot", "humid", "hurricane", "ice", "rain", "snow", "storms", "wind" and "other". Each tweet in training data has probabilities against all the label. Say the message "Can already tell it's going to be a tough scoring day. It's as windy right now as it was yesterday afternoon." has 21% chance for being hot and 79% chance for wind. I have worked on the

Plotting the hyperplane of LDA (ClassificationDiscriminant)

大城市里の小女人 提交于 2019-12-12 04:48:02
问题 I am trying to compare various classifiers on my data, such as LDA and SVM etc, by visually investigate the separation hyperplane. Currently I am using ClassificationDiscriminant as the LDA classifier, unlike SVM can draw the hyperplane on the graph, I could not find a way to plot the hyperplane of the LDA classifier. The following script is how I produce a sample data and get it been classified using ClassificationDiscriminant: %% Data & Label X = [randn(100,2); randn(150,2) + 1.5]; Y =

Modelling card game for machine learning [closed]

被刻印的时光 ゝ 提交于 2019-12-12 04:10:15
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 months ago . I'm looking for some help modelling this machine learning problem. A hand consists of three rows (containing 3, 5, and 5 cards respectively). Your goal is to build a hand that scores the most points. You receive the cards in intervals called streets, five cards in the first

Why am I getting a 1.000 ROC area value even when I don't have 100% of accuracy

一笑奈何 提交于 2019-12-12 03:49:18
问题 I am using Weka as a classifier, and it has worked great for me so far. However, in my last test, I got a 1.000 ROC area value (which, if i remember correctly, represents a perfect classification) without having 100% of accuracy, as can be seen in the Confusion Matrix in the Figure. My question is: Am I interpreting the results incorrectly or am I getting wrong results (maybe the classifier I am using is badly programmed, although I don't think it's likely)? Classification output Thank You!

WEKA - filtering out classes in a MultiClassClassifer

人盡茶涼 提交于 2019-12-12 03:29:59
问题 I have trained a MultiClassClassifier (tested, working) and saved it somewhere on my hard drive. Now I want to make predictions for a new sample I got. I load my application and my classifier auto loads with it. I have narrowed down the search to five 5 possible classes already for the sample, outside the classification process. This means, I know k classes, that can easily be avoided in the classification. Is it possible to filter a MultiClassClassifier (filter out all unwanted classes)

Predicting the “no class” / unrecognised class in Weka Machine Learning

白昼怎懂夜的黑 提交于 2019-12-12 03:27:34
问题 I am using Weka 3.7 to classify text documents based on their content. I have a set of text files in folders and they all belong to a certain category. Category A: 100 txt files Category B: 100 txt files ... Category X: 100 txt files I want to predict if a document falls into one of the categories A-X, OR if it falls in the category UNRECOGNISED (for all other documents). I am getting the total set of Instances programatically like this: private Instances getTotalSet(){ ArrayList<Attribute>