How does the Tensorflow's TripletSemiHardLoss and TripletHardLoss and how to use with Siamese Network?

半腔热情 提交于 2021-02-19 08:37:18


As much as I know that Triplet Loss is a Loss Function which decrease the distance between anchor and positive but decrease between anchor and negative. Also, there is a margin added to it.

So for EXAMPLE LEt us Suppose: a Siamese Network, which gives embeddings:

anchor_output = [1,2,3,4,5...] # embedding given by the CNN model
positive_output = [1,2,3,4,4...]
negative_output= [53,43,33,23,13...]

And I think I can get the triplet loss such as: (I think I have to make it as loss using Lambda Layer or so)

# calculate triplet loss
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)

loss = tf.maximum(0., margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)

So what on the earth is: tfa.losses.TripletHardLoss and tfa.losses.TripletSemiHardLoss

As much as I know, Semi and hard are type of data generation techniques for Siamese Techniques which push the model to learn more.

MY Thinking: As I have learned it in This Post, I think you can do:

  1. Generate a Batch of say 3 images and make a pair of 3 having 27 images
  2. Discard every invalid pair (all i,j,k should be unique). Remaining Batch B
  3. Get the embeddings on each pair in batch B

So I think HardTripletLoss takes account of only those 3 images per batch which had Biggest Anchor-Positive distance and Lowest Anchor- Negative distance.

And for Semi Hard, I think it discards all the losses calculated by every image pair where the distance was 0.

if not, Could someone please correct me and tell me how these can be used. (I know we can use it inside model.complie() but my question is different.


What is TripletHardLoss?

This loss follow the ordinary TripletLoss form, but using the maximum positive distance and minimum negative distance plus the margin constant within the batch when computing the loss, as we can see in the formula:

Look into source code of tfa.losses.TripletHardLoss we can see above formula been implement exactly:

# Build pairwise binary adjacency matrix.
adjacency = tf.math.equal(labels, tf.transpose(labels))
# Invert so we can select negatives only.
adjacency_not = tf.math.logical_not(adjacency)

adjacency_not = tf.cast(adjacency_not, dtype=tf.dtypes.float32)
# hard negatives: smallest D_an.
hard_negatives = _masked_minimum(pdist_matrix, adjacency_not)

batch_size = tf.size(labels)

adjacency = tf.cast(adjacency, dtype=tf.dtypes.float32)

mask_positives = tf.cast(adjacency, dtype=tf.dtypes.float32) - tf.linalg.diag(

# hard positives: largest D_ap.
hard_positives = _masked_maximum(pdist_matrix, mask_positives)

if soft:
    triplet_loss = tf.math.log1p(tf.math.exp(hard_positives - hard_negatives))
    triplet_loss = tf.maximum(hard_positives - hard_negatives + margin, 0.0)

# Get final mean triplet loss
triplet_loss = tf.reduce_mean(triplet_loss)

Note the soft parameter in tfa.losses.TripletHardLoss are not using following formula to calculate the ordinary TripletLoss:

Because as we can see in above source code, it still using maximum positive distance and minimum negative distance, it determine using the soft margin or not

What is TripletSemiHardLoss?

This loss also follow the ordinary TripletLoss form, positive distances is same as in ordinary TripletLoss and negative distance using semi-hard negative:

Minimum negative distance among which are at least greater than the positive distance plus the margin constant, if no such negative exists, uses the largest negative distance instead.

i.e we want first find negative distance that satisfies following condition:

p for positive and n for negative, if wan can't find the negative distance that satisfies this condition then we using largest negative distance instead.

As we can see above condition process clear in source code of tfa.losses.TripletSemiHardLoss, where negatives_outside is distance that satisfies this condition and negatives_inside is largest negative distance:

# Build pairwise binary adjacency matrix.
adjacency = tf.math.equal(labels, tf.transpose(labels))
# Invert so we can select negatives only.
adjacency_not = tf.math.logical_not(adjacency)

batch_size = tf.size(labels)

# Compute the mask.
pdist_matrix_tile = tf.tile(pdist_matrix, [batch_size, 1])
mask = tf.math.logical_and(
    tf.tile(adjacency_not, [batch_size, 1]),
        pdist_matrix_tile, tf.reshape(tf.transpose(pdist_matrix), [-1, 1])
mask_final = tf.reshape(
            tf.cast(mask, dtype=tf.dtypes.float32), 1, keepdims=True
    [batch_size, batch_size],
mask_final = tf.transpose(mask_final)

adjacency_not = tf.cast(adjacency_not, dtype=tf.dtypes.float32)
mask = tf.cast(mask, dtype=tf.dtypes.float32)

# negatives_outside: smallest D_an where D_an > D_ap.
negatives_outside = tf.reshape(
    _masked_minimum(pdist_matrix_tile, mask), [batch_size, batch_size]
negatives_outside = tf.transpose(negatives_outside)

# negatives_inside: largest D_an.
negatives_inside = tf.tile(
    _masked_maximum(pdist_matrix, adjacency_not), [1, batch_size]
semi_hard_negatives = tf.where(mask_final, negatives_outside, negatives_inside)

loss_mat = tf.math.add(margin, pdist_matrix - semi_hard_negatives)

mask_positives = tf.cast(adjacency, dtype=tf.dtypes.float32) - tf.linalg.diag(

# In lifted-struct, the authors multiply 0.5 for upper triangular
#   in semihard, they take all positive pairs except the diagonal.
num_positives = tf.math.reduce_sum(mask_positives)

triplet_loss = tf.math.truediv(
        tf.math.maximum(tf.math.multiply(loss_mat, mask_positives), 0.0)

How to use those loss?

Both loss expect y_true to be provided as 1-D integer Tensor with shape [batch_size] of multi-class integer labels. And embeddings y_pred must be 2-D float Tensor of l2 normalized embedding vectors.

Example code to prepare the inputs and labels:

import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds

def _normalize_img(img, label):
    img = tf.cast(img, tf.float32) / 255.
    return (img, label)

train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)

# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(16)
train_dataset =

# Take one batch of data
for data in train_dataset.take(1):
    print("Batch of images shape:\n{}\nBatch of labels:\n{}\n".format(data[0].shape, data[1]))


Batch of images shape:
(16, 28, 28, 1)
Batch of labels:
[8 4 0 3 2 4 5 1 0 5 7 0 2 6 4 9]

Following this official tutorial about how to using TripletSemiHardLoss (TripletHardLoss as well) in general if you have problem when using it.

