dimensionality-reduction

PCA Dimension reducion for classification

阅读更多关于 PCA Dimension reducion for classification

问题 I am using Principle Component Analysis on the features extracted from different layers of CNN. I have downloaded the toolbox of dimension reduction from here. I have a total of 11232 training images and feature for each image is 6532. so the feature matrix is like that 11232x6532 If I want top 90% features I can easily do that and training accuracy using SVM of reduced data is 81.73% which is fair. However, when I try the testing data which have 2408 images and features of each image is 6532

Dimensionality Reduction using Self Organizing Maps

阅读更多关于 Dimensionality Reduction using Self Organizing Maps

问题 I have been working on Self Organizing Maps(SOM) for the past few months.But I still have some confusions in understanding the dimensionaliy reduction part.Can you suggest any simple method to understand the real working of SOMs on any real world data sets (like a data set from UCI repository). 回答1: Ok so first of all refer to some previous related questions which will give you a better understanding of the dimensional reduction and visualization properties of the SOM. Plotting the Kohonen

After reducing the dimensionality of a dataset, I am getting negative feature values

阅读更多关于 After reducing the dimensionality of a dataset, I am getting negative feature values

问题 I used a Dimensionality Reduction method (discussion here: Random projection algorithm pseudo code) on a large dataset. After reducing the dimension from 1000 to 50, I get my new dataset where each sample looks like: [ 1751. -360. -2069. ..., 2694. -3295. -1764.] Now I am a bit confused, because I don't know what negative feature values supposed to mean. Is it okay to have negative features like this? Because before the reduction, each sample was like this: 3, 18, 18, 18, 126 ... Is it normal

How to obtain the eigenvalues after performing Multidimensional scaling?

阅读更多关于 How to obtain the eigenvalues after performing Multidimensional scaling?

问题 I am interested in taking a look at the Eigenvalues after performing Multidimensional scaling. What function can do that ? I looked at the documentation, but it does not mention Eigenvalues at all. Here is a code sample: mds = manifold.MDS(n_components=100, max_iter=3000, eps=1e-9, random_state=seed, dissimilarity="precomputed", n_jobs=1) results = mds.fit(wordDissimilarityMatrix) # need a way to get the Eigenvalues 回答1: I also couldn't find it from reading the documentation. I suspect they

PCA with sklearn. Unable to figure out feature selection with PCA

阅读更多关于 PCA with sklearn. Unable to figure out feature selection with PCA

I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses of the Gabor filters for that filterbank. Now, as I understand it I can do a decomposition of this

PCA with sklearn. Unable to figure out feature selection with PCA

阅读更多关于 PCA with sklearn. Unable to figure out feature selection with PCA

问题 I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses

Visualizing distance matrix using tSNE - Python

阅读更多关于 Visualizing distance matrix using tSNE - Python

问题 I've computed a distance matrix and I'm trying two approach to visualized it. This is my distance matrix: delta = [[ 0. 0.71370845 0.80903791 0.82955157 0.56964983 0. 0. ] [ 0.71370845 0. 0.99583115 1. 0.79563006 0.71370845 0.71370845] [ 0.80903791 0.99583115 0. 0.90029133 0.81180111 0.80903791 0.80903791] [ 0.82955157 1. 0.90029133 0. 0.97468433 0.82955157 0.82955157] [ 0.56964983 0.79563006 0.81180111 0.97468433 0. 0.56964983 0.56964983] [ 0. 0.71370845 0.80903791 0.82955157 0.56964983 0. 0

Reduce string length by removing contiguous duplicates

阅读更多关于 Reduce string length by removing contiguous duplicates

问题 I have an R dataframe whith 2 fields: ID WORD 1 AAAAABBBBB 2 ABCAAABBBDDD 3 ... I'd like to simplify the words with repeating letters by keeping only the letter and not the duplicates in a repetition: e.g.: AAAAABBBBB should give me AB and ABCAAABBBDDD should give me ABCABD Anyone has an idea on how to do this? 回答1: Here's a solution with regex: x <- c('AAAAABBBBB', 'ABCAAABBBDDD') gsub("([A-Za-z])\\1+","\\1",x) EDIT: By request, some benchmarking. I added Matthew Lundberg's pattern in the

Visualizing distance matrix using tSNE - Python

阅读更多关于 Visualizing distance matrix using tSNE - Python

I've computed a distance matrix and I'm trying two approach to visualized it. This is my distance matrix: delta = [[ 0. 0.71370845 0.80903791 0.82955157 0.56964983 0. 0. ] [ 0.71370845 0. 0.99583115 1. 0.79563006 0.71370845 0.71370845] [ 0.80903791 0.99583115 0. 0.90029133 0.81180111 0.80903791 0.80903791] [ 0.82955157 1. 0.90029133 0. 0.97468433 0.82955157 0.82955157] [ 0.56964983 0.79563006 0.81180111 0.97468433 0. 0.56964983 0.56964983] [ 0. 0.71370845 0.80903791 0.82955157 0.56964983 0. 0. ] [ 0. 0.71370845 0.80903791 0.82955157 0.56964983 0. 0. ]] Considering labels from 1 to 7 , 1 is

Reduce string length by removing contiguous duplicates

阅读更多关于 Reduce string length by removing contiguous duplicates

I have an R dataframe whith 2 fields: ID WORD 1 AAAAABBBBB 2 ABCAAABBBDDD 3 ... I'd like to simplify the words with repeating letters by keeping only the letter and not the duplicates in a repetition: e.g.: AAAAABBBBB should give me AB and ABCAAABBBDDD should give me ABCABD Anyone has an idea on how to do this? Here's a solution with regex: x <- c('AAAAABBBBB', 'ABCAAABBBDDD') gsub("([A-Za-z])\\1+","\\1",x) EDIT: By request, some benchmarking. I added Matthew Lundberg's pattern in the comment, matching any character. It appears that gsub is faster by an order of magnitude, and matching any