classification | 易学教程

How to use a cross validation test with MATLAB?

阅读更多关于 How to use a cross validation test with MATLAB?

问题 I would like to use 10-fold Cross-validation to evaluate a discretization in MATLAB. I should first consider the attributes and the class column. 回答1: In Statistics Toolbox there is CROSSVAL function, which performs 10-fold cross validation by default. Check it out. Another function CROSSVALIND exists in Bioinformatics Toolbox. Also there is an open source Generic-CV tool: http://www.cs.technion.ac.il/~ronbeg/gcv/ 回答2: If you would rather write your own xval wrapper rather than using built-in

Naive bayesian classifier - multiple decisions

阅读更多关于 Naive bayesian classifier - multiple decisions

问题 I need to know whether the Naive bayesian classifier can be used to generate multiple decisions. I couldn't find any examples which have any evidence in supporting multiple decisions. I'm new to this area. So, I'm bit confused. Actually I need to develop character recognition software. There I need to identify what the given character is. It seems the Bayesian classifier can be used to identify whether a character given is a particular character or not, but it cannot give any other

Clustering problem

阅读更多关于 Clustering problem

问题 I've been tasked to find N clusters containing the most points for a certain data set given that the clusters are bounded by a certain size. Currently, I am attempting to do this by plugging in my data into a kd-tree, iterating over the data and finding its nearest neighbor, and then merging the points if the cluster they make does not exceed a limit. I'm not sure this approach will give me a global solution so I'm looking for ways to tweak it. If you can tell me what type of problem this

Value of k in k nearest neighbor algorithm

阅读更多关于 Value of k in k nearest neighbor algorithm

问题 I have 7 classes that needs to be classified and I have 10 features. Is there a optimal value for k that I need to use in this case or do I have to run the KNN for values of k between 1 and 10 (around 10) and determine the best value with the help of the algorithm itself? 回答1: In addition to the article I posted in the comments there is this one as well that suggests: Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value

Working with decision trees

阅读更多关于 Working with decision trees

问题 I know tl;dr; I'll try to explain my problem without bothering you with ton's of crappy code. I'm working on a school assignment. We have pictures of smurfs and we have to find them with foreground background analysis. I have a Decision Tree in java that has all the data (HSV histograms) 1 one single node. Then tries to find the best attribute (from the histogram data) to split the tree on. Then executes the split and creates a left and a right sub tree with the data split over both node

List of possible classifiers and types in dependencies

阅读更多关于 List of possible classifiers and types in dependencies

问题 I have searched the net for the all the possible values that you can put in the scope tag inside dependency tag, but I haven't found any list with the same data for the classiffier and the type. Anybody knows what I can and cannot put inside this tags? Just to be clear, I am not asking what does the classifier tag or the type tag do, I just want a list of the data that this tag accepts or where can I find it. Thanks! 回答1: From the Maven Reference: Update Oops, I misunderstood the question.

Difference between predict(model) and predict(model$finalModel) using caret for classification in R

阅读更多关于 Difference between predict(model) and predict(model$finalModel) using caret for classification in R

问题 Whats the difference between predict(rf, newdata=testSet) and predict(rf$finalModel, newdata=testSet) i train the model with preProcess=c("center", "scale") tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T) rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) and i receive 0 true positives when i run it on a centered and scaled testSet testSetCS <- testSet xTrans <- preProcess(testSetCS) testSetCS<- predict(xTrans,

keras image preprocessing unbalanced data

阅读更多关于 keras image preprocessing unbalanced data

问题 All, I'm trying to use Keras to do image classification on two classes. For one class, I have very limited number of images, say 500. As for the other class, I have almost infinite number of images. So if I want to use keras image preprocessing, how to do that? Ideally, I need something like this. For class one, I feed 500 images and use ImageDataGenerator to get more images. For class two, each time I extract 500 images in sequence from 1000000 image dataset and probably no data augmentation

McNemar's test in Python and comparison of classification machine learning models [closed]

阅读更多关于 McNemar's test in Python and comparison of classification machine learning models [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . Is there a good McNemar's test implemented in Python? I don't see it anywhere in Scipy.stats or Scikit-Learn. I may have overlooked some other good packages. Please recommend. McNemar's Test is almost THE test for comparing two classification algorithms/models given a holdout test set (not through K-fold or

loss, val_loss, acc and val_acc do not update at all over epochs

阅读更多关于 loss, val_loss, acc and val_acc do not update at all over epochs

问题 I created an LSTM network for sequence classification (binary) where each sample has 25 timesteps and 4 features. The following is my keras network topology: Above, the activation layer after Dense layer uses softmax function. I used binary_crossentropy for loss function and Adam as the optimizer to compile the keras model. Trained the model with batch_size=256, shuffle=True and validation_split=0.05, The following is the training log: Train on 618196 samples, validate on 32537 samples 2017