You probably want to follow Lectures 3 and 4 at http://www.ml-class.org. Professor Ng has solved this exact problem. He is classifying 10 digits (0...9). Some of the things that he did in the class that gets him to a 95% training accuracy are :
- Input Nueron : 400 (20x20)
- Hidden Layers : 2
- Size of hidden layers : 25
- Activation function : sigmoid
- Training method : gradient descent
- Data size : 5000