data-mining | 易学教程

Speed-efficient classification in Matlab

阅读更多关于 Speed-efficient classification in Matlab

问题 I have an image of size as RGB uint8(576,720,3) where I want to classify each pixel to a set of colors. I have transformed using rgb2lab from RGB to LAB space, and then removed the L layer so it is now a double(576,720,2) consisting of AB. Now, I want to classify this to some colors that I have trained on another image, and calculated their respective AB-representations as: Cluster 1: -17.7903 -13.1170 Cluster 2: -30.1957 40.3520 Cluster 3: -4.4608 47.2543 Cluster 4: 46.3738 36.5225 Cluster 5

Why does one hot encoding improve machine learning performance?

阅读更多关于 Why does one hot encoding improve machine learning performance?

问题 I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix itself as training data. How does this performance increase happen? 回答1: Many learning algorithms either learn a single weight per feature, or they use distances between samples. The former is the case for linear models such as logistic regression

Matlab - PCA analysis and reconstruction of multi dimensional data

阅读更多关于 Matlab - PCA analysis and reconstruction of multi dimensional data

问题 I have a large dataset of multidimensional data(132 dimensions). I am a beginner at performing data mining and I want to apply Principal Components Analysis by using Matlab. However, I have seen that there are a lot of functions explained on the web but I do not understand how should they be applied. Basically, I want to apply PCA and to obtain the eigenvectors and their corresponding eigenvalues out of my data. After this step I want to be able to do a reconstruction for my data based on a

scikit-learn DBSCAN memory usage

阅读更多关于 scikit-learn DBSCAN memory usage

问题 UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI\'s DBSCAN implimentation to do my clustering rather than scikit-learn\'s. It can be run from the command line and with proper indexing, performs this task within a few hours. Use the GUI and small sample datasets to work out the options you want to use and then go to town. Worth looking into. Anywho, read on for a description of my original problem and

1D Number Array Clustering [duplicate]

阅读更多关于 1D Number Array Clustering [duplicate]

问题 This question already has an answer here : Closed 7 years ago . Possible Duplicate: Cluster one-dimensional data optimally? So let\'s say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through similar questions yet most people suggested using k-means to cluster points, like scipy, which is quite confusing to use for a beginner like me. Also I think that k-means is more