import numpy as np def relu(z): return np.maximum(0,z) def d_relu(z): z[z>0]=1 z[z<=0]=0 return z x=np.array([5,1,-4,0]) y=relu(x) z=d_relu(y) print("y = {}".format(y)) print("z = {}".format(z))
The code above prints out:
y = [1 1 0 0] z = [1 1 0 0]
instead of
y = [5 1 0 0] z = [1 1 0 0]
From what I understand the function calls I've used should only be doing passing by value,passing a copy of the variable.
Why is my d_relu function affecting the y variable?
Your first mistake is in assuming python passes objects by value... it doesn't - it's pass by assignment (similar to passing by reference, if you're familiar with this concept). However, only mutable objects, as the name suggests, can be modified in-place. This includes, among other things, numpy arrays.
You shouldn't have d_relu
modify z
inplace, because that's what it's doing right now, through the z[...] = ...
syntax. Try instead building a mask using broadcasted comparison and returning that instead.
def d_relu(z): return (z > 0).astype(int)
This returns a fresh array instead of modifying z
in-place, and your code prints
y = [5 1 0 0] z = [1 1 0 0]
If you're building a layered architecture, you can leverage the use of a computed mask during the forward pass stage:
class relu: def __init__(self): self.mask = None def forward(self, x): self.mask = x > 0 return x * self.mask def backward(self, x): return self.mask
Where the derivative is simply 1 if the input during feedforward if > 0, else 0.