feature-selection

Are MFCC features required for speech recognition

浪子不回头ぞ 提交于 2019-12-11 06:07:08
问题 I'm currently developing a speech recognition project and I'm trying to select the most meaningful features. Most of the relevant papers suggest using Zero Crossing Rates, F0, and MFCC features therefore I'm using those. My question is, a training sample with duration of 00:03 has 268 features. Considering I'm doing a multi class classification project with 50+ samples per class training including all MFCC features may suffer the project from curse of dimensionality or 'reduce the importance'

Error: protect(): protection stack overflow while feature extraction

别说谁变了你拦得住时间么 提交于 2019-12-11 03:59:30
问题 I have a dataframe that has 4755 rows and 27199 columns. It's actually a document term matrix and I'm trying to perform feature selection using the "FSelector" package. Here is some of the code below: library(FSelector) weights <- information.gain(Flag~., dtmmatdf) Each time I do this I get an error Error: protect(): protection stack overflow I have a 24GB RAM and the dataframe is about 500Mb in size. So I don't know what the problem is and how do I fix it? 来源: https://stackoverflow.com

How do I use AdaBoost for feature selection?

吃可爱长大的小学妹 提交于 2019-12-11 03:26:25
问题 I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set. Im currently using in Open CV's CvBoost . I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used. Using either CvBoost , a 3rd party

Access all models produced by rfe in caret

﹥>﹥吖頭↗ 提交于 2019-12-10 17:52:12
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . I'm using the rfe function in the caret package to do feature selection for logistic regression model. I'm looking at sizes of 5, 10, 15, 20, and 25 selecting the best model using Rsquared (my dependent variable is 0,1). Is there a way to access the other models produced by the rfe function beyond the final selected model? 回答1: There is no automatic way. The best thing you can

PCL Point Feature Histograms - binning

雨燕双飞 提交于 2019-12-10 15:27:53
问题 The binning process, which is part of the point feature histogram estimation, results in b^3 bins if only the three angular features (alpha, phi, theta) are used, where b is the number of bins. Why is it b^3 and not b * 3 ? Let's say we consider alpha. The feature value range is subdivided into b intervals. You iterate over all neighbors of the query point and count the amount of alpha values which lie in one interval. So you have b bins for alpha. When you repeat this for the other two

How to do feature selection using linear SVM weights [closed]

柔情痞子 提交于 2019-12-10 14:03:42
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed last year . I have built a SVM linear model for two types of classes (1 and 0), using the following code: class1.svm.model <- svm(Class ~ ., data = training,cost=1,cross=10, metric="ROC",type="C-classification",kernel="linear",na.action=na.omit,probability = TRUE) and I have extracted the

Part of Speech (POS) tag Feature Selection for Text Classification

我怕爱的太早我们不能终老 提交于 2019-12-09 07:01:42
问题 I have the POS tag sentences obtain using Stanford POS tagger. Eg: The/DT island/NN was/VBD very/RB beautiful/JJ ./. I/PRP love/VBP it/PRP ./. (xml format also available) Can anyone explain how to perform feature selection from this POS tag sentences and convert them into feature vector for text classification using machine learning method. 回答1: A simple way to start out would be something like the following (assuming word order is not important for your classification algorithm). First you

How SelectKBest (chi2) calculates score?

跟風遠走 提交于 2019-12-08 17:31:34
I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score value is calculated. I know that theoretically high score is more valuable, but I need a mathematical formula or an example to calculate the score for learning this deeply. bestfeatures = SelectKBest(score_func=chi2, k=10) fit = bestfeatures.fit(dataValues, dataTargetEncoded) feat_importances = pd.Series(fit.scores_, index=dataValues.columns)

Why normalizing labels in MxNet makes accuracy close to 100%?

泄露秘密 提交于 2019-12-08 11:37:50
问题 I am training a model using multi-label logistic regression on MxNet (gluon api) as described here: multi-label logit in gluon My custom dataset has 13 features and one label of shape [,6]. My features are normalized from original values to [0,1] I use simple dense neural net with 2 hidden layers. I noticed when I don't normalize labels (which take discrete values of 1,2,3,4,5,6 and are purely my choice to map categorical values to these numbers), my training process slowly converges to some

How to use wrapper feature selection algorithms in R?

吃可爱长大的小学妹 提交于 2019-12-08 08:19:27
问题 I have several algorithms: rpart, kNN, logistic regression, randomForest, Naive Bayes, and SVM. I'd like to use forward/backward and genetic algorithm selection for finding the best subset of features to use for the particular algorithms. How can I implement wrapper type forward/backward and genetic selection of features in R? 回答1: I'm testing wrappers at the moment so I'll give you a few Pacckage names in R. What is a wrapper? Now to the Methods: MASS Package: Choose a model by AIC in a