问题

I am trying to understand how deep features are made in an autoencoder.

I created an autoencoder with h2o.deeplearning and then I tried to calculate the deepfeatures manually.

The autoencoder

fit = h2o.deeplearning(
x = names(x_train),
training_frame = x_train,
activation = "Tanh",
autoencoder = TRUE,
hidden = c(25,10),
epochs = 100,
export_weights_and_biases = TRUE,
)

I used as activation function Tanh and 2 hidden layers with no dropout, to make the things simple.

Calculating hidden layer 1 deep features manually

Then I extracted the weighs and biases that goes from the input layer to the hidden layer 1

w12 = as.matrix(h2o.weights(fit, 1))
b12 = as.matrix(h2o.biases (fit,1))

I prepared the training data for the operations normalizing it between the compact interval of [-0.5 , 0.5] because h2o does that automatically in autoencoders.

normalize = function(x) {(((x-min(x))/(max(x)-min(x))) - 0.5)}
d.norm =  apply(d, 2, normalize)`

Then I calculated manually the deepfeatures of the first layer

a12 = d.norm %*% t(w12)
b12.rep = do.call(rbind, rep(list(t(b12)), nrow(d.norm)))
z12 = a12 + b12.rep
f12 = tanh(z12)

When I compared those values with hidden layer 1 deep features, they didnt match

hl1.output = as.matrix(h2o.deepfeatures(fit, x_train, layer = 1))
all.equal(
as.numeric(f12[,1]),
hl1.output[, 1],
check.attributes = FALSE,
use.names = FALSE,
tolerance = 1e-04
)
[1] "Mean relative difference: 0.4854887"

Calculating hidden layer 2 deep features manually

Then I tried to do the same thing to calculate manually the deep features of the hiddem layer 2 from the deep features of the hidden layer 1

a23 = hl1.output %*% t(w23)
b23.rep = do.call(rbind, rep(list(t(b23)), nrow(a23)))
z23 = a23 + b23.rep
f23 = tanh(z23)

Comparing these values with the deep features of the hidden layer 2 I saw that they match perfecly

hl2.output = as.matrix(h2o.deepfeatures(fit,x_train,layer = 2))
all.equal(
as.numeric(f23[,1]),
hl2.output[, 1],
check.attributes = FALSE,
use.names = FALSE,
tolerance = 1e-04
)
[1] TRUE

Calculating the output layer features manually

I tried the same thing for the output layer

a34 = hl2.output %*% t(w34)
b34.rep = do.call(rbind, rep(list(t(b34)), nrow(a34)))
z34 = a34 + b34.rep
f34 = tanh(z34)

I compared the result with the output I had and I could not get the same result

all.equal(
as.numeric(f34[1,]),
output[1,],
check.attributes = FALSE,
use.names = FALSE,
tolerance = 1e-04
)
[1] "Mean relative difference: 3.019762"

The questions

I think that I am not normalizing data in the correct way because I can recreate the deep features of the hidden layer 2 with the features of the hidden layer 1. I do not understand what is wrong, because with autoencoder = TRUE h2o should normalize the data between[-0.5:0.5]

I dont understand why the manual calculation of the output layer does not work

1) How to calculate manually the deep features of the hidden layer 1?

2) How to calculate manually the output features?

回答1:

You're using:

 normalize = function(x) {(((x-min(x))/(max(x)-min(x))) - 0.5)}

They are using this Java code:

 normMul[idx] = (v.max() - v.min() > 0)?1.0/(v.max() - v.min()):1.0;
 normSub[idx] = v.mean();

And then it is used like this:

numVals[i] = (numVals[i] - normSub[i])*normMul[i];

I.e. subtract the mean, then divide by the range (or, equivalently, multiply by 1 over the range). So, ignoring the check for divide-by-zero, I think your R code needs to be:

 normalize = function(x) {(x-mean(x))/(max(x)-min(x))}

With the check for zero, something like:

 normalize = function(x) {mul=max(x)-min(x);if(mul==0)mul=1;return((x-mean(x))/mul)}

Just playing around with that, it seems to have a range of 1.0, but it is not centred around 0.0, i.e. it is not the -0.5 to +0.5 described in the H2O documentation (e.g. p.20 in deep learning booklet). Did I miss something in the Java code?

By the way this line is where it decides to NORMALIZE for auto-encoders, rather than STANDARDIZE for other deep learning.

来源：https://stackoverflow.com/questions/49711455/h2o-deeplearning-autoencoder-calculating-deep-features-manually

标签

deep-learning

h2o

autoencoder

h2o.deeplearning autoencoder, calculating deep features manually