classification

How to get different Variable Importance for each class in a binary h2o GBM in R?

﹥>﹥吖頭↗ 提交于 2019-12-22 01:25:48
问题 I'm trying to explore the use of a GBM with h2o for a classification issue to replace a logistic regression (GLM). The non-linearity and interactions in my data make me think a GBM is more suitable. I've ran a baseline GBM (see below) and compared the AUC against the AUC of the logistic regression. THe GBM performs much better. In a classic linear logistic regression, one would be able to see the direction and effect of each of the predictors (x) on the outcome variable (y). Now, I would like

machine learning to overcome typo errors [closed]

余生长醉 提交于 2019-12-21 23:13:46
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . I have a list of names of medicines suppose(crocin,seroflo,oxitab,etc).The list is very long. Now suppose I need to find whether a particular medicine is present or not in the list,but also there could be typo errors.supposing I intended to find crocin in the list,but i instead

Working with text classification and big sparse matrices in R

不问归期 提交于 2019-12-21 22:22:49
问题 I'm working on a text multi-class classification project and I need to build the document / term matrices and train and test in R language. I already have datasets that don't fit in the limited dimensionality of the base matrix class in R and would need to build big sparse matrices to be able to classify for example, 100k tweets. I am using the quanteda package, as it has been for now more useful and reliable than the package tm , where creating a DocumentTermMatrix with a dictionary, makes

Image Classification - Detecting Floor Plans

北城以北 提交于 2019-12-21 19:28:12
问题 I am working on a real estate website and i would like to write a program that can figure out(classify) if an image is a floor plan or a company logo. Since i am writing in php i will prefer a php solution but any c++ or opencv solution will be fine as well. Floor Plan Sample: alt text http://www.rentingtime.com/uploads/listing/l0050/0000050930/68614.jpg alt text http://www.rentingtime.com/uploads/listing/l0031/0000031701/44199.jpg Logo Sample: alt text http://www.rentingtime.com/uploads

How to plot a ROC curve using ROCR package in r, *with only a classification contingency table*

徘徊边缘 提交于 2019-12-21 13:07:42
问题 How to plot a ROC curve using ROCR package in r, with only a classification contingency table ? I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. I have 500 replications, therefore 500 tables. But, I can not generate a prediction data indicating each single case of estimating probability and the truth. How can I get a curve without the individual data. Below is the package instruction used. ## computing a simple ROC curve (x-axis: fpr, y

Negative decision_function values

跟風遠走 提交于 2019-12-21 06:15:21
问题 I am using support vector classifier from sklearn on the Iris dataset. When I call decision_function it returns negative values. But all samples in test dataset after classification has right class. I think that decision_function should return the positive value when the sample is an inlier and negative if the sample is an outlier. Where I am wrong? from sklearn import datasets from sklearn.svm import SVC from sklearn.model_selection import train_test_split iris = datasets.load_iris() X =

Merging bag-of-words scikits classifier with arbitrary numeric fields

☆樱花仙子☆ 提交于 2019-12-21 06:10:07
问题 How would you merge a scikits-learn classifier that operates over a bag-of-words with one that operates on arbitrary numeric fields? I know that these are basically the same thing behind-the-scenes, but I'm having trouble figuring out how to do this via the existing library methods. For example, my bag-of-words classifier uses the pipeline: classifier = Pipeline([ ('vectorizer', HashingVectorizer(ngram_range=(1,4), non_negative=True)), ('tfidf', TfidfTransformer()), ('clf',

Extract decision boundary with scikit-learn linear SVM

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-21 05:17:20
问题 I have a very simple 1D classification problem: a list of values [0, 0.5, 2] and their associated classes [0, 1, 2]. I would like to get the classification boundaries between those classes. Adapting the iris example (for visualization purposes), getting rid of the non-linear models: X = np.array([[x, 1] for x in [0, 0.5, 2]]) Y = np.array([1, 0, 2]) C = 1.0 # SVM regularization parameter svc = svm.SVC(kernel='linear', C=C).fit(X, Y) lin_svc = svm.LinearSVC(C=C).fit(X, Y) Gives the following

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

你离开我真会死。 提交于 2019-12-21 05:10:21
问题 This question is a continuation of the same thread here. Below is a minimal working example taken from this book: Wehrens R. Chemometrics with R multivariate data analysis in the natural sciences and life sciences. 1st edition. Heidelberg; New York: Springer. 2011. (page 250). The example was taken from this book and its package ChemometricsWithR . It highlighted some pitfalls when modeling using cross-validation techniques. The Aim: A cross-validated methodology using the same set of

Change maven dependency for artifact using classifier

五迷三道 提交于 2019-12-21 04:15:53
问题 With the maven jar plugin I build two jar: bar-1.0.0.jar and bar-1.0.0-client.jar. Actually in my POM I have the following dependency: <dependency> <groupId>de.app.test</groupId> <artifactId>foo</artifactId> <version>1.0.0</version> </dependency> This artifact exist also in two version bar-1.0.0.jar and bar-1.0.0-client.jar I want to make bar-1.0.0-client.jar dependent of foo-1.0.0-client.jar and bar-1.0.0.jar dependent of foo-1.0.0.jar . ================ ->First (wrong) solution: define the