classification

Why does single-layer perceptron converge so slow without normalization, even when the margin is large?

左心房为你撑大大i 提交于 2019-12-21 02:53:07
问题 The bounty expires in 2 days . Answers to this question are eligible for a +100 reputation bounty. AlwaysLearning wants to draw more attention to this question. This question is totally re-written after I confirmed my results (the Python Notebook can be found here) with a piece of code written by someone else (can be found here). Here is that code instrumented by me to work with my data and to count epochs till convergence: import numpy as np from matplotlib import pyplot as plt class

What's inside a haar cascade classifier in Open CV computer vision?

人盡茶涼 提交于 2019-12-21 02:47:12
问题 I need to translate an .xml OpenCV haar cascade to a txt file. (Open CV has a Haar Feature-based Cascade Classifier for Object Detection.) So I need to understand the xml. I'm wondering what are the "stages" and the "trees". Does a tree stand for a weak classifier? Are the trees in the same stage combined to be a strong classifier?? Are the stages cascaded??? In a tree from haarcascade_frontalface_alt.xml, it says: <!-- tree 0 --> <_> <!-- root node --> <feature> <rects> <_>3 7 14 4 -1.</_> <

Domain name classification API

拈花ヽ惹草 提交于 2019-12-21 00:26:07
问题 I need to categorize domains into different categories that offer the best use of a domain name. Like categorizing 'gamez.com' as a gaming portal. Is there any service that offers classification of domain name like Sedo is doing? 回答1: All the systems that I am aware of manage a list, somewhat by hand. Using a web-filtering proxies (e.g. WebSense) for inspiration, you could scan for keywords contained in the domain name, or in web content/meta tags at the specified location. However, there are

Classification report with Nested Cross Validation in SKlearn (Average/Individual values)

孤人 提交于 2019-12-20 20:41:58
问题 Is it possible to get classification report from cross_val_score through some workaround? I'm using nested cross-validation and I can get various scores here for a model, however, I would like to see the classification report of the outer loop. Any recommendations? # Choose cross-validation techniques for the inner and outer loops, # independently of the dataset. # E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc. inner_cv = KFold(n_splits=4, shuffle=True, random_state=i) outer_cv =

Predicting how long an scikit-learn classification will take to run

扶醉桌前 提交于 2019-12-20 17:39:52
问题 Is there a way to predict how long it will take to run a classifier from sci-kit learn based on the parameters and dataset? I know, pretty meta, right? Some classifiers/parameter combinations are quite fast, and some take so long that I eventually just kill the process. I'd like a way to estimate in advance how long it will take. Alternatively, I'd accept some pointers on how to set common parameters to reduce the run time. 回答1: There are very specific classes of classifier or regressors that

What is the difference between sample weight and class weight options in scikit learn?

痞子三分冷 提交于 2019-12-20 11:57:07
问题 I have class imbalance problem and want to solve this using cost sensitive learning. under sample and over sample give weights to class to use a modified loss function Question Scikit learn has 2 options called class weights and sample weights. Is sample weight actually doing option 2) and class weight options 1). Is option 2) the the recommended way of handling class imbalance. 回答1: It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples,

Custom metric (hmeasure) for summaryFunction caret classification

不羁的心 提交于 2019-12-20 10:57:21
问题 I am trying to use the hmeasure metric Hand,2009 as my custom metric for training SVMs in caret. As I am relatively new to using R, I tried adapt the twoClassSummary function. All I need is to pass the true class labels and predicted class probability from the model (an svm) to the HMeasure function from the hmeasure package instead of using ROC or other measures of classification performance in caret. For example, a call to the HMeasure function in R - HMeasure(true.class,predictedProbs[,2])

Scalable or online out-of-core multi-label classifiers

醉酒当歌 提交于 2019-12-20 10:49:23
问题 I have been blowing my brains out over the past 2-3 weeks on this problem. I have a multi-label (not multi-class) problem where each sample can belong to several of the labels. I have around 4.5 million text documents as training data and around 1 million as test data. The labels are around 35K. I am using scikit-learn . For feature extraction I was previously using TfidfVectorizer which didn't scale at all, now I am using HashVectorizer which is better but not that scalable given the number

scikit learn output metrics.classification_report into CSV/tab-delimited format

浪尽此生 提交于 2019-12-20 10:25:25
问题 I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model from __future__ import print_function # Read **`file.csv`** into a pandas DataFrame import pandas as pd path = 'data/file.csv' merged = pd.read_csv(path, error_bad_lines=False, low_memory=False) # define X and y using the original DataFrame X = merged.text y =

Java machine learning library for commercial use? [closed]

房东的猫 提交于 2019-12-20 09:56:47
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . Does anyone know a good Java machine learning library I can use for a commercial product? Weka and Rapidminer unfortunately do not allow this. I already found Apache Mahout and Java Data Mininng Package. Has anyone experience with them and provide some decision support? The task calls for clustering and