Suppose we have weights
x = tf.Variable(np.random.random((5,10)))
cost = ...
And we use the GD optimizer:
upds = tf.train.G
You can take the Lagrangian approach and simply add a penalty for features of the variable you don't want.
e.g. To encourage theta
to be non-negative, you could add the following to the optimizer's objective function.
added_loss = -tf.minimum( tf.reduce_min(theta),0)
If any theta
are negative, then add2loss will be positive, otherwise zero. Scaling that to a meaningful value is left as an exercise to the reader. Scaling too little will not exert enough pressure. Too much may make things unstable.
As of TensorFlow 1.4, there is a new argument to tf.get_variable
that allows to pass a constraint function that is applied after the update of the optimizer. Here is an example that enforces a non-negativity constraint:
with tf.variable_scope("MyScope"):
v1 = tf.get_variable("v1", …, constraint=lambda x: tf.clip_by_value(x, 0, np.infty))
constraint: An optional projection function to be applied to the variable after being updated by an
Optimizer
(e.g. used to implement norm constraints or value constraints for layer weights). The function must take as input the unprojected Tensor representing the value of the variable and return the Tensor for the projected value (which must have the same shape). Constraints are not safe to use when doing asynchronous distributed training.