classification

Value of k in k nearest neighbor algorithm

被刻印的时光 ゝ 提交于 2019-12-05 11:35:55
I have 7 classes that needs to be classified and I have 10 features. Is there a optimal value for k that I need to use in this case or do I have to run the KNN for values of k between 1 and 10 (around 10) and determine the best value with the help of the algorithm itself? In addition to the article I posted in the comments there is this one as well that suggests: Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are

SPMD vs. Parfor

£可爱£侵袭症+ 提交于 2019-12-05 08:22:47
I'm new about parallel computing in matlab. I have a function which creates a classifiers (SVM) and I'd like to test it with several dataset. I've got a 2 core workstation so I'd like to run test in parallel. Can someone explain me the difference between: dataset_array={dataset1, dataset2} matlabpool open 2 spmd my_function(dataset(labindex)); end and dataset_array={dataset1, dataset2} matlabpool open 2 parfor i:1=2 my_function(dataset(i)); end spmd is a parallel region, while parfor is a parallel for loop. The difference is that in spmd region you have a much larger flexibility when it comes

How to implement a hold-out validation in R

旧城冷巷雨未停 提交于 2019-12-05 07:47:52
问题 Let's say I'm using the Sonar data and I'd like to make a hold-out validation in R. I partitioned the data using the createFolds from caret package as folds <- createFolds(mydata$Class, k=5) . I would like then to use exactly the fold mydata[i] as test data and train a classifier using mydata[-i] as train data. My first thought was to use the train function, but I couldn't find any support for hold-out validation. Am I missing something here? Also, I'd like to be able to use exactly the pre

How to set proper arguments to build keras Convolution2D NN model [Text Classification]?

喜欢而已 提交于 2019-12-05 07:42:04
问题 I am trying to use 2D CNN to do text classification on Chinese Article and have trouble on setting arguments of keras Convolution2D . I know the basic flow of Convolution2D to cope with image, but stuck by using my dataset with keras. Input data My data is 9800 Chinese Article, max sentence length is 6810,with 200 word2vec size. So the input shape is `(9800, 1, 6810, 200)` Code for building model MAX_FEATURES = 6810 # I just randomly pick one filter, seems this is the problem? nb_filter = 128

Simple example/use-case for a BNT gaussian_CPD?

做~自己de王妃 提交于 2019-12-05 07:20:51
问题 I am attempting to implement a Naive Bayes classifier using BNT and MATLAB. So far I have been sticking with simple tabular_CPD variables and "guesstimating" probabilities for the variables. My prototype net so far consists of the following: DAG = false(5); DAG(1, 2:5) = true; bnet = mk_bnet(DAG, [2 3 4 3 3]); bnet.CPD{1} = tabular_CPD(bnet, 1, [.5 .5]); bnet.CPD{2} = tabular_CPD(bnet, 2, [.1 .345 .45 .355 .45 .3]); bnet.CPD{3} = tabular_CPD(bnet, 3, [.2 .02 .59 .2 .2 .39 .01 .39]); bnet.CPD

How to calculate KNN Variable Importance in R

佐手、 提交于 2019-12-05 07:17:48
问题 I implemented an Authorship attribution project where I was able to train my KNN model with articles from two authors using KNN. Then, I classify the author of a new article to be either author A or author B. I use knn() function to generate the model. The output of the model is the table below. Word1 Word2 Word3 Author 11 1 48 8 A 2 2 0 0 B 29 1 45 9 A 1 2 0 0 B 4 0 0 0 B 28 3 1 1 B As seen from the model, it is obvious to see that Word2 and Word3 are the most significant variables that

Support Vector Machine works on Training-set but not on Test-set in R (using e1071)

ぃ、小莉子 提交于 2019-12-05 07:00:13
问题 I'm using a support vector machine for my document classification task! it classifies all my Articles in the training-set, but fails to classify the ones in my test-set! trainDTM is the document term matrix of my training-set. testDTM is the one for the test-set. here's my (not so beautiful) code: # create data.frame with labelled sentences labeled <- as.data.frame(read.xlsx("C:\\Users\\LABELED.xlsx", 1, header=T)) # create training set and test set traindata <- as.data.frame(labeled[1:700,c(

How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

守給你的承諾、 提交于 2019-12-05 06:27:18
How to plot a ROC curve using ROCR package in r, with only a classification contingency table ? I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. I have 500 replications, therefore 500 tables. But, I can not generate a prediction data indicating each single case of estimating probability and the truth. How can I get a curve without the individual data. Below is the package instruction used. ## computing a simple ROC curve (x-axis: fpr, y-axis: tpr) library(ROCR) data(ROCR.simple) pred <- prediction( ROCR.simple$predictions, ROCR.simple

classification: PCA and logistic regression using sklearn

六眼飞鱼酱① 提交于 2019-12-05 05:39:01
Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For example, I would have: n_train = 10 np.random.seed(0) df_train = pd.DataFrame({"f1":np.random.random(n

Creating an ARFF file from python output

£可爱£侵袭症+ 提交于 2019-12-05 04:07:44
gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating'