Neural Network training with PyBrain won't converge

前端未结

关注

 4  1673

I have the following code, from the PyBrain tutorial:

from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2020-12-08 05:01
              
            
            
                                                                       
The following seems to consistently give the right results:

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import TanhLayer
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer

#net = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
net = buildNetwork(2, 3, 1, bias=True)

ds = SupervisedDataSet(2, 1)
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))
ds.addSample((0, 0), (0,))
ds.addSample((0, 1), (1,))
ds.addSample((1, 0), (1,))
ds.addSample((1, 1), (0,))

trainer = BackpropTrainer(net, ds, learningrate=0.001, momentum=0.99)

trainer.trainUntilConvergence(verbose=True)

print net.activate([0,0])
print net.activate([0,1])
print net.activate([1,0])
print net.activate([1,1])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2020-12-08 05:05
              
            
            
                                                                       
trainer = BackpropTrainer(net, ds, learningrate = 0.9, momentum=0.0, weightdecay=0.0, verbose=True) 
trainer.trainEpochs(epochs=1000)


This way can converge. if learningrate is too small(e.g. 0.01), it lost in local minimum. As I have tested, learningrate in 0.3-30, it can converge.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2020-12-08 05:13
              
            
            
                                                                       
I took the excellent Machine Learning class on Coursera, taught by Andrew Ng, and one part of the class covered training a small neural net to recognize xor.  So I was a bit troubled by the pybrain example based on parts of the quickstart that did not converge.

I think there are many reasons, including the one above about the minimal dataset being split into training and validation. At one point in the course Andrew said "its not the person with the best algorithm that wins, its the one with the most data. And he went on to explain that the explosion in data availability in the 2000's is part of the reason for the resurgence in AI, now called Machine Learning.

So with all that in mind I found that


the validation set can have 4 samples, because that comes after the training phase.
the network only needs 2 nodes in the hidden layer, as I learned in the class,
the learning rate needs to be pretty small in this case, like 0.005, or else the training will sometimes skip over the answer (this is an important point from the class that I confirmed by playing with the numbers).
the smaller the learning rate, the smaller the maxEpochs can be. A small learning rate means that the convergence takes smaller steps along the gradient toward minimization. If its bigger, you need a bigger maxEpochs so that it will wait longer before deciding it has hit a minimum.
You need a bias=True in the network (which adds a constant 1 node to the input and hidden layers). Read the answers to this question about bias.
Finally, and most important, you need a big training set. 1000 converged on the right answer about 75% of the time. I suspect this has to do with the minimization algorithm. Smaller numbers would fail frequently.


So here's some code that works:

from pybrain.datasets import SupervisedDataSet

dataModel = [
    [(0,0), (0,)],
    [(0,1), (1,)],
    [(1,0), (1,)],
    [(1,1), (0,)],
]

ds = SupervisedDataSet(2, 1)
for input, target in dataModel:
    ds.addSample(input, target)

# create a large random data set
import random
random.seed()
trainingSet = SupervisedDataSet(2, 1);
for ri in range(0,1000):
    input,target = dataModel[random.getrandbits(2)];
    trainingSet.addSample(input, target)

from pybrain.tools.shortcuts import buildNetwork
net = buildNetwork(2, 2, 1, bias=True)

from pybrain.supervised.trainers import BackpropTrainer
trainer = BackpropTrainer(net, ds, learningrate = 0.001, momentum = 0.99)
trainer.trainUntilConvergence(verbose=True,
                              trainingData=trainingSet,
                              validationData=ds,
                              maxEpochs=10)

print '0,0->', net.activate([0,0])
print '0,1->', net.activate([0,1])
print '1,0->', net.activate([1,0])
print '1,1->', net.activate([1,1])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2020-12-08 05:20
              
            
            
                                                                       
After some more digging I found that the example on the PyBrain's tutorial is completely out of place.

When we look at the method signature in the source code we find:

def trainUntilConvergence(self, dataset=None, maxEpochs=None, verbose=None, continueEpochs=10, validationProportion=0.25):


This means that 25% of the training set is used for validation. Although that is a very valid method when training a network on data you are not going to do this when you have the complete range of possiblities at your disposal, namely a 4-row XOR 2-in-1-out solution set. When one wants to train an XOR set and you remove one of the rows for validation that has as an immediate consequence that you get a very sparse training set where one of the possible combinations is omitted resulting automatically into those  weights not being trained.

Normally when you omit 25% of the data for validation you do this by assuming that those 25% cover 'most' of the solution space the network already has encountered more or less. In this case this is not true and it covers 25% of the solution space completely unknown to the network since you removed it for validation.

So, the trainer was training the network correctly, but by omitting 25% of the XOR problem this results in a badly trained network.

A different example on the PyBrain website as a quickstart would be very handy, because this example is just plain wrong in this specific XOR case. You might wonder if they tried the example themselves, because it just outputs random badly trained networks.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复