data-mining

Speed-efficient classification in Matlab

﹥>﹥吖頭↗ 提交于 2019-11-26 08:53:50
问题 I have an image of size as RGB uint8(576,720,3) where I want to classify each pixel to a set of colors. I have transformed using rgb2lab from RGB to LAB space, and then removed the L layer so it is now a double(576,720,2) consisting of AB. Now, I want to classify this to some colors that I have trained on another image, and calculated their respective AB-representations as: Cluster 1: -17.7903 -13.1170 Cluster 2: -30.1957 40.3520 Cluster 3: -4.4608 47.2543 Cluster 4: 46.3738 36.5225 Cluster 5

Why does one hot encoding improve machine learning performance?

耗尽温柔 提交于 2019-11-26 08:39:47
问题 I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to prediction accuracy, compared to using the original matrix itself as training data. How does this performance increase happen? 回答1: Many learning algorithms either learn a single weight per feature, or they use distances between samples. The former is the case for linear models such as logistic regression

Matlab - PCA analysis and reconstruction of multi dimensional data

最后都变了- 提交于 2019-11-26 07:24:40
问题 I have a large dataset of multidimensional data(132 dimensions). I am a beginner at performing data mining and I want to apply Principal Components Analysis by using Matlab. However, I have seen that there are a lot of functions explained on the web but I do not understand how should they be applied. Basically, I want to apply PCA and to obtain the eigenvectors and their corresponding eigenvalues out of my data. After this step I want to be able to do a reconstruction for my data based on a

scikit-learn DBSCAN memory usage

瘦欲@ 提交于 2019-11-26 06:29:37
问题 UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI\'s DBSCAN implimentation to do my clustering rather than scikit-learn\'s. It can be run from the command line and with proper indexing, performs this task within a few hours. Use the GUI and small sample datasets to work out the options you want to use and then go to town. Worth looking into. Anywho, read on for a description of my original problem and

1D Number Array Clustering [duplicate]

你。 提交于 2019-11-26 04:38:00
问题 This question already has an answer here : Closed 7 years ago . Possible Duplicate: Cluster one-dimensional data optimally? So let\'s say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through similar questions yet most people suggested using k-means to cluster points, like scipy, which is quite confusing to use for a beginner like me. Also I think that k-means is more