tf_agents custom time_step_spec

问题

I'm tinkering with tf-agents but I'm having trouble making a custom time_step_spec.

I'm trying to train a tf-agent in gym 'Breakout-v0', I've made a function to preprocess the observation (game pixels) and now I want to modify the time_step and time_step_spec to reflect the new data.

original time_step_spec.observation() is:

BoundedTensorSpec(shape=(210, 160, 3), dtype=tf.uint8, name='observation', minimum=array(0, dtype=uint8), maximum=array(255, dtype=uint8))

mine would be:

BoundedTensorSpec(shape=(1, 165, 150), dtype=tf.float32, name='observation', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32))

I've managed to create a custom BoundedTensorSpec and modify the time_step with the function

processed_timestep = timestep._replace(observation=processed_obs)

now I'm having trouble understanding how to modify the time_step_spec I don't fully understand what is it, nor how to modify its components.

original time_step_spec is:

TimeStep(step_type=TensorSpec(shape=(), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(), dtype=tf.float32, name='reward'), discount=BoundedTensorSpec(shape=(), dtype=tf.float32, name='discount', minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), observation=BoundedTensorSpec(shape=(210, 160, 3), dtype=tf.uint8, name='observation', minimum=array(0, dtype=uint8), maximum=array(255, dtype=uint8)))

What structure is it exactly? an array of tensors?
How can I access its components?
Can I make a custom time_step_spec with multiple components? (reward, observation, etc.)
Can I just modify a single component?

回答1:

This can be easily solved by using the environment. In TF-Agents the environment needs to follow the PyEnvironment class (and then you wrap this with a TFPyEnvironment for parallel execution of multiple envs). If you have already defined your environment to match this class' specification then your environment should already provide you with the two methods env.time_step_spec() and env.action_spec(). Simply feed these two to your agent and you should be done.

To create a time_step with multiple components you can look at this example:

self._observation_spec = {'observations': array_spec.ArraySpec(shape=(100,), dtype=np.float64),
                          'legal_moves': array_spec.ArraySpec(shape=(self.num_moves(),), dtype=np.bool_),
                          'other_output1': array_spec.ArraySpec(shape=(10,), dtype=np.int64),
                          'other_output2': array_spec.ArraySpec(shape=(2, 5, 2), dtype=np.int64)}

This line goes in the environment's __init__ function. you don't need to know what any of this is actually doing, but you can see it fundamentally is a dict with string keys and ArraySpec values. Note that if you do this you are going to have to define an observation_and_action_constraint_splitter to pass to the agent that discards all components that shouldn't be fed to the agent's input. Here is an example of how to construct an appropriate TimeStep with this dict of observations insidre your env._step method:

observations_and_legal_moves = {'observations': current_agent_obs,
                                'legal_moves': np.ones(shape=(self.num_moves(), dtype=np.bool_),
                                'other_output1': np.ones(shape=(10, dtype=np.int64,
                                'other_output2': np.ones(shape=(2, 5, 2), dtype=np.int64}

ts.transition(observations_and_legal_moves, reward, self.gamma)

If already have your environment with its Tensor output built and you just can't figure out what the appropriate TensorSpec should be (it can be very fiddly to get it), then you can simply call tf.TensorSpec.from_tensor(tensor) to figure out what you must define.

来源：https://stackoverflow.com/questions/58348203/tf-agents-custom-time-step-spec

标签

python-3.x

tensorflow

reinforcement-learning