I wrote a confusion matrix calculation code in Python:
def conf_mat(prob_arr, input_arr):
# confusion matrix
conf_arr = [[0, 0], [0, 0]]
In a general sense, you're going to need to change your probability array. Instead of having one number for each instance and classifying based on whether or not it is greater than 0.5, you're going to need a list of scores (one for each class), then take the largest of the scores as the class that was chosen (a.k.a. argmax).
You could use a dictionary to hold the probabilities for each classification:
prob_arr = [{classification_id: probability}, ...]
Choosing a classification would be something like:
for instance_scores in prob_arr :
predicted_classes = [cls for (cls, score) in instance_scores.iteritems() if score = max(instance_scores.values())]
This handles the case where two classes have the same scores. You can get one score, by choosing the first one in that list, but how you handle that depends on what you're classifying.
Once you have your list of predicted classes and a list of expected classes you can use code like Torsten Marek's to create the confusion array and calculate the accuracy.