Neural Network in python: Decision/Classification always gives 0.5

问题

First of all I wanna say that I am a python beginner and also completely new to neural networks. When I read about it I was very excited and thought I set up a little code from scratch (see code below).

But somehow my code is not working properly. I guess there are some major bugs (in the algorithm and the programming?). But I cannot find them at the moment.

So, in the handwritten notes you can see my system (and some formulas). I wanna solve a decision problem where I have data in the form of X=(x1,x2) and y (which is 0 or 1).

My network has one hidden layer consisting of 3 neurons and one output layer. As an activation function I use sigmoid and for the loss I use cross entropy (sth like log likelihood for bernoulli, I guess?)

The neurons take the weighted input W.X + bias and return a scalar between 0,1.

For the learning process I tried to use backward propagation. So I just computed the derivative dLoss/dparams and applied the chain rule several times. In order not to make everything in index notation I tried to use numpy to handle matrices, etc.

Maybe someone sees directly the things I did wrong? (apart from the bad programming :D)

Handwritten notes 1/2 Handwritten notes 2/2

#!/usr/bin/python
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt

## create random data set for decision problem
np.random.seed(0) #fixed seed to reproduce results
X, y = datasets.make_moons(20, noise=0.20) # lists containing the Data
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral) # plot it
plt.show() # show plot; proceeds when plot is closed

## initialize model parameters
W1 = np.random.uniform(-0.5,0.5,[3,2]) # hidden layer weights (3 x 2)   matrix
b1 = np.random.uniform(-1,1,[3])   # bias for neurons in hidden layer
W2 = np.random.uniform(-0.5,0.5,[1,3]) # weights for output layer (1 x 3)
b2 = np.random.uniform(-1,1,[1]) # bias for output neuron

# collecting parameters in model dict
model = {"W1" : W1, "W2" : W2, "b1" : b1, "b2" : b2}

## the activation function
# can also return the derivative
def sigmoid(x,derivative = False):
    if derivative == True:
        # derivative; np.multiply multiplies element-wise
        # needed if x is tensor-like object
        return np.multiply(sigmoid(x), (1 - sigmoid(x)))
    else:
        return 1.0/(1.0 + np.exp(-x))

## moving forward in the network for a single data point
# and returns a dict with necessary information
def move_forward(model, DataX):
    W1 = model["W1"] # extract model params from dict to make it better readable
    W2 = model["W2"]
    b1 = model["b1"]
    b2 = model["b2"]
    t1 = np.dot(W1,DataX) + b1 # weighted input for hidden layer (here 3-dim object)
    phi = sigmoid(t1) # evaluate activation function
    phiP = sigmoid(t1, True) # derivative (needed for moving backward "learning")
    t2 = np.dot(W2,phi) + b2 # weighted input for output layer (1-dim object)
    sig = sigmoid(t2) # evaluate final output
    sigP = sigmoid(t2, True) # derivative
    forward = {"phi" : phi,"phiP" : phiP, # dict collecting the output
             "sig" : sig, "sigP" : sigP}
    return forward

## moving backward for a single data point
def move_backward(forward, model, DataX):
    W1 = model["W1"]
    W2 = model["W2"]
    b1 = model["b1"]
    b2 = model["b2"]    
    phi = forward["phi"]
    phiP = forward["phiP"]
    sig = forward["sig"]
    sigP = forward["sigP"]
    #not the full deltaWs / deltabs; multiplied by the rest in "update_model"
    dW2 = sigP * phi # part from "derivative chain" roughly: dsig/dt2 dt2 / dW2
    db2 = sigP # analogue
    temp = np.multiply(W2,phiP) # multiplied element wise
    dW1 = sigP * np.outer(temp, DataX) # outer product since: (W2 * phi)_j x_i
    db1 = sigP * np.outer(temp, [1]) # analogue
    backward = {"dW1": dW1, "dW2": dW2, "db1": db1, "db2": db2}
    return backward

## part of the loss function; here for one data point
# returns also the derivative for the learning process
def loss(DataY, PredictionY, derivative = False):
    if derivative == True:
        return DataY / PredictionY - (1.0 - DataY) / (1.0 - PredictionY)
    log_likelihood = DataY * np.log(PredictionY) + (1.0 - DataY) * np.log(1.0 - PredictionY) 
    return log_likelihood

## updating model parameters
## epsilon is a small parameter regulating the learning
def update_model(DataSet,model, epsilon):
    DataX = DataSet[0]
    DataY = DataSet[1]
    total_loss = 0
    dW1_total = 0
    dW2_total = 0
    db1_total = 0
    db2_total = 0
    beta = 0
    W1 = model["W1"]
    W2 = model["W2"]
    b1 = model["b1"]
    b2 = model["b2"]
    # iterating over full data set
    for i in range(len(DataX)):
        forward = move_forward(model, DataX[i])
        backward = move_backward(forward, model, DataX[i])
        sig = forward["sig"]        
        total_loss += loss(DataY[i],sig)
        beta += loss(DataY[i],sig, True)
        dW1_total += backward["dW1"]
        dW2_total += backward["dW2"]
        db1_total += backward["db1"]
        db2_total += backward["db2"]
    total_loss *= -1.0/len(DataX) # the total loss
    beta *= -1.0/len(DataX) # the derivative of dloss/dsig
    ## setting updated model params
    W1_new = W1 - epsilon * beta * dW1_total
    W2_new = W2 - epsilon * beta * dW2_total
    b1_new = b1 - epsilon * beta * np.squeeze(np.asarray(db1_total)) 
    b2_new = b2 - epsilon * beta *  db2_total
    model_updated = {"W1": W1_new, "W2": W2_new, "b1": b1_new,
                     "b2": b2_new, "loss": total_loss}
    return model_updated

## train the model with a given data set N times
def train_model(DataSet,model, epsilon, N, print_state = False):
    for i in range(N):        
        model = update_model(DataSet,model, epsilon)
        if print_state == True:
            if i % 100 == 0:
                print(model)
                print("loss = " , model["loss"])
    print(model)
    return model


## call the training function and store the output
model_new = train_model([X,y],model, 0.01, 1000, True)
## check result with data point in the training set
move_forward(model_new,X[0])

# Note: Hm, somehow I always get sig = 0.5 (roughly). And the loss
# does not get smaller than 0.68
# I guess there must be several mistakes

来源：https://stackoverflow.com/questions/35315289/neural-network-in-python-decision-classification-always-gives-0-5

标签

python

neural-network

classification