When trying to get cross-entropy with sigmoid activation function, there is a difference between
loss1 = -tf.reduce_sum(p*tf.log(q), 1)
you can understand differences between softmax and sigmoid cross entropy in following way:
so anyway the cross entropy is:
p * -tf.log(q)
for softmax cross entropy it looks exactly as above formula,
but for sigmoid, it looks a little different for it has multi binary probability distribution for each binary probability distribution, it is
p * -tf.log(q)+(1-p) * -tf.log(1-q)
p and (1-p) you can treat as two class probability within each binary probability distribution