ReLU derivative with NumPy

匿名 (未验证) 提交于 2019-12-03 01:34:02

问题:

import numpy as np  def relu(z):     return np.maximum(0,z)  def d_relu(z):     z[z>0]=1     z[z<=0]=0     return z  x=np.array([5,1,-4,0]) y=relu(x) z=d_relu(y) print("y = {}".format(y)) print("z = {}".format(z)) 

The code above prints out:

y = [1 1 0 0] z = [1 1 0 0] 

instead of

y = [5 1 0 0] z = [1 1 0 0] 

From what I understand the function calls I've used should only be doing passing by value,passing a copy of the variable.

Why is my d_relu function affecting the y variable?

回答1:

Your first mistake is in assuming python passes objects by value... it doesn't - it's pass by assignment (similar to passing by reference, if you're familiar with this concept). However, only mutable objects, as the name suggests, can be modified in-place. This includes, among other things, numpy arrays.

You shouldn't have d_relu modify z inplace, because that's what it's doing right now, through the z[...] = ... syntax. Try instead building a mask using broadcasted comparison and returning that instead.

def d_relu(z):     return (z > 0).astype(int) 

This returns a fresh array instead of modifying z in-place, and your code prints

y = [5 1 0 0] z = [1 1 0 0] 

If you're building a layered architecture, you can leverage the use of a computed mask during the forward pass stage:

class relu:     def __init__(self):         self.mask = None      def forward(self, x):         self.mask = x > 0         return x * self.mask      def backward(self, x):         return self.mask 

Where the derivative is simply 1 if the input during feedforward if > 0, else 0.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!