classification

(Python Scipy) How to flatten a csr_matrix and append it to another csr_matrix?

半城伤御伤魂 提交于 2019-12-06 05:39:43
I am representing each XML document as a feature matrix in a csr_matrix format. Now that I have around 3000 XML documents, I got a list of csr_matrices. I want to flatten each of these matrices to become feature vectors, then I want to combine all of these feature vectors to form one csr_matrix representing all the XML documents as one, where each row is a document and each column is a feature. One way to achieve this is through this code X= csr_matrix([a.toarray().ravel().tolist() for a in ls]) where ls is the list of csr_matrices, however, this is highly inefficient, as with 3000 documents,

Training images using SVM on OpenCV

夙愿已清 提交于 2019-12-06 05:03:30
问题 I am trying to do classification with images (next step I'll classify based on features but now just want to try whether I am doing it right or not) here is my code. #include <opencv2/core/core.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/ml/ml.hpp> using namespace cv; using namespace std; int main(){ Mat image[2]; image[0]= imread("image.jpg",0); image[1]= imread("wrongimage.jpg",0); Mat rotated = imread("image.jpg",0); image[0] = image[0].reshape(0, 1); //SINGLE LINE image

Classifying data with naive bayes using LingPipe

℡╲_俬逩灬. 提交于 2019-12-06 04:47:23
问题 I want to classify certain data into different classes based on its content. I did it using naive bayes classifier and I get an output as the best category to which it belongs. But now I want to classify the news other than those in the training set into "others" class. I can't manually add each/every data other than the training data into a certain class since it has vast number of other categories.So is there any way to classify the other data?. private static File TRAINING_DIR = new File(

How to understand the functional margin in SVM ?

大兔子大兔子 提交于 2019-12-06 04:28:14
问题 I'm reading Andrew NG's Machine Learning notes, but the functional margin definition confused me : I can understand to geometric margin is the distance from x to its hyperplane, but how to understand functional margin ? And why they define its formula like that ? 回答1: Think of it like this: w^T.x_i +b is the model's prediction for the i-th data point. Y_i is its label. If the prediction and ground truth have the same sign, then gamma_i will be positive. The further "inside" the class boundary

batch size does not work for caffe with deploy.prototxt

做~自己de王妃 提交于 2019-12-06 04:14:08
I'm trying to make my classification process a bit faster. I thought of increasing the first input_dim in my deploy.prototxt but that does not seem to work. It's even a little bit slower than classifying each image one by one. deploy.prototxt input: "data" input_dim: 128 input_dim: 1 input_dim: 120 input_dim: 160 ... net description ... python net initialization net=caffe.Net( 'deploy.prototxt', 'model.caffemodel', caffe.TEST) net.blobs['data'].reshape(128, 1, 120, 160) transformer = caffe.io.Transformer({'data':net.blobs['data'].data.shape}) #transformer settings python classification images=

How to deal with low frequency examples in classification?

最后都变了- 提交于 2019-12-06 03:50:43
I'm facing a text classification problem, and I need to classify examples to 34 groups. The problem is, the size of training data of 34 groups are not balanced. For some groups I have 2000+ examples, while for some I only have 100+ examples. For some small groups, the classification accuracy is quite high. I guess those groups may have specific key words to recognize and classify. While for some, the accuracy is low, and the prediction always goes to large groups. I want to know how to deal with the "low frequency example problem". Would simply copy and duplicate the small group data work? Or

How to use a cross validation test with MATLAB?

纵然是瞬间 提交于 2019-12-06 03:49:57
I would like to use 10-fold Cross-validation to evaluate a discretization in MATLAB. I should first consider the attributes and the class column. In Statistics Toolbox there is CROSSVAL function, which performs 10-fold cross validation by default. Check it out. Another function CROSSVALIND exists in Bioinformatics Toolbox. Also there is an open source Generic-CV tool: http://www.cs.technion.ac.il/~ronbeg/gcv/ If you would rather write your own xval wrapper rather than using built-in functions, I often use randperm() to generate random orderings of my data, which you can then partition using a

NLTK MEGAM Max Ent algorithms on Windows

China☆狼群 提交于 2019-12-06 03:32:10
问题 I have been playing with NLTK on Python but unable to use the MEGAM Max Ent algorithm due to the lack of a Windows 64-bit executable of any version of the MEGAM library equal or above 0.3 (needs to include the -nobias option for NLTK to work, which was introduced in v. 0.3). http://www.cs.utah.edu/~hal/megam/ The author recommends compiling your own executable, although getting O'Caml to work on Win64 is just another nightmare. Does anyone out there have a Windows compiled version of the

How to apply classifier in Weka's Explorer?

你说的曾经没有我的故事 提交于 2019-12-06 03:13:58
问题 Let's say, I've build a model (e.g. J4.8 tree) and evaluated it with cross-validation. How can I use this model to classify new dataset ? I know, I can set a file with the data to classify with "Supplied test set" option, mark "Output predictions" in "More options" window and run classification again. It will produce nearly what I need, but it seems to be a very strange workflow. Also, it re-creates all the model, which can take unnecessary time. Is there more straightforward way to do

Creating training data for a Maxent classfier in Java

会有一股神秘感。 提交于 2019-12-06 02:41:20
问题 I am trying to create the java implementation for maxent classifier. I need to classify the sentences into n different classes. I had a look at ColumnDataClassifier in stanford maxent classifier. But I am not able to understand how to create training data. I need training data in the form where training data includes POS Tags for words for sentence, so that the features used for classifier will be like previous word, next word etc. I am looking for training data which has sentences with POS