classification

randomForest in R object not found error

江枫思渺然 提交于 2019-12-12 03:24:27
问题 # init libs <- c("tm", "plyr", "class", "RTextTools", "randomForest") lapply(libs, require, character.only = TRUE) # set options options(stringsAsFactors = FALSE) # set parameters labels <- read.table('labels.txt') path <- paste(getwd(), "/data", sep="") # clean text cleanCorpus <- function(corpus) { corpus.tmp <- tm_map(corpus, removePunctuation) corpus.tmp <- tm_map(corpus.tmp, removeNumbers) corpus.tmp <- tm_map(corpus.tmp, stripWhitespace) corpus.tmp <- tm_map(corpus.tmp, content

Computer Vision - Is it necessary to have multi classifiers with certain viewpoint for object detection?

左心房为你撑大大i 提交于 2019-12-12 03:22:44
问题 Let say I want to train a HOG descriptor + Linear SVM for a car detection. Is it necessary for me to make, let say three classifiers, that are back-view, front-view and side-view of the car or I can just train a single classier for all viewpoints of the car? 回答1: It's not necessary but recommended. You can make a single classifier which handles multiple cases but it won't perform very well overall. The issue here isn't so much the variability of descriptor responses between the different

Python file format for email classification with svm-light

房东的猫 提交于 2019-12-12 02:53:40
问题 I am working with email subject, so I have 20 emails i want to classify, and a file with 20 lines - one line has one email subject.I have been working on it, but I am unable to figure out what the features refer to and the format of the input file for svmlight. Any tips to proceed will be helpful. Thanks in advance! Edit: I have taken the tf-idf of the first 500 subject lines as a trial. However, according to svm-light format, we need: <line> .=. <target> <feature>:<value> <feature>:<value> .

xgboost: which parameters are used in the linear booster gblinear?

余生长醉 提交于 2019-12-12 02:12:39
问题 Looking on the web I am still a confused about what the linear booster gblinear precisely is and I am not alone. Following the documentation it only has 3 parameters lambda ,lambda_ bias and alpha - maybe it should say "additional parameters". If I understand this correctly then the linear booster does (rather standard) linear boosting (with regularization). In this context I can only make sense of the 3 parameters above and eta (the boosting rate). That's also how it is described on github.

Python decision tree classification of complex objects

久未见 提交于 2019-12-12 02:06:06
问题 I have a collection of clothing / accessory products (represented by a Python object) with various attributes. These products are generated by a combination of querying an external API and scraping the merchant websites to obtain various attributes. My goal is to develop a classifier that uses these attributes to correctly categorise the products (i.e. into categories such as trousers, t-shirts, dresses etc.). I have both a training and a test data set which are a subset of the entire data

Why does k=1 in KNN give the best accuracy?

痞子三分冷 提交于 2019-12-12 01:36:26
问题 I am using Weka IBk for text classificaiton. Each document basically is a short sentence. The training dataset contains 15,000 documents. While testing, I can see that k=1 gives the best accuracy? How can this be explained? 回答1: If you are querying your learner with the same dataset you have trained on with k=1, the output values should be perfect barring you have data with the same parameters that have different outcome values. Do some reading on overfitting as it applies to KNN learners. In

Wrong output of prediction function in tensorflow

落爺英雄遲暮 提交于 2019-12-12 01:21:27
问题 I am going to perform pixel-based classification on an image. Here is the code I used for training the NN net = input_data(shape=[None, 1,4]) net = tflearn.lstm(net, 128, return_seq=True) net = tflearn.lstm(net, 128) net = tflearn.fully_connected(net, 1, activation='softmax') net = tflearn.regression(net, optimizer='adam', loss='categorical_crossentropy') model = tflearn.DNN(net, tensorboard_verbose=2, checkpoint_path='model.tfl.ckpt') X_train = np.expand_dims(X_train, axis=1) model.fit(X

Executing Weka Classification in C# in Parallel

廉价感情. 提交于 2019-12-12 00:16:20
问题 I have asked a few broad questions about the operations of Weka and C# as well as WekaSharp, so I thought I would try to ask a more focused question to try to progress further on my own. As an example given from the weka site on executing weka from C# I was using I would like to run part of the calculation using parallel operations but am not sure how to code it here is the raw code: using System; using System.Collections.Generic; using System.Linq; using System.Text; using weka.classifiers

R ggplot colour labelling time series based on class

不想你离开。 提交于 2019-12-11 23:47:32
问题 I have two time series as below: y1 <- mvrnorm(50, c(3,1), matrix(c(0.5,0.3,0.3,0.3),2,2))# 2-D bivariate normal y2 <- mvrnorm(50, c(1,0), matrix(c(2,.1,.1,1),2,2))# another 2-D bivariate normal y <- rbind(y1,y2) # append the second to the end of the first I plot these with ggplot: yd <- as.data.frame(y) g<- ggplot(data=yd) + geom_line(aes(x=1:nrow(yd), y=yd$V1, colour= "TS1"))+ geom_line(aes(x=1:nrow(yd), y=yd$V2, colour= "TS2"))+ scale_colour_manual(name= "Levels", values = c("TS1"= "black"

predict continuous values using sklearn bagging classifier

戏子无情 提交于 2019-12-11 23:25:17
问题 Can I use sklearn's BaggingClassifier to produce continuous predictions? Is there a similar package? My understanding is that the bagging classifier predicts several classifications with different models, then reports the majority answer. It seems like this algorithm could be used to generate probability functions for each classification then reporting the mean value. trees = BaggingClassifier(ExtraTreesClassifier()) trees.fit(X_train,Y_train) Y_pred = trees.predict(X_test) 回答1: If you're