Utilities¶
Utilities for training and evaluating RL models on OpenAI gym environments

class
numpy_ml.rl_models.rl_utils.
EnvModel
[source]¶ A simple tabular environment model that maintains the counts of each rewardoutcome pair given the state and action that preceded them. The model can be queried with
>>> M = EnvModel() >>> M[(state, action, reward, next_state)] += 1 >>> M[(state, action, reward, next_state)] 1 >>> M.state_action_pairs() [(state, action)] >>> M.outcome_probs(state, action) [(next_state, 1)]

reward_outcome_pairs
(s, a)[source]¶ Return all (reward, next_state) pairs associated with taking action a in state s.

outcome_probs
(s, a)[source]¶ Return the probability under the environment model of each outcome state after taking action a in state s.
Parameters:  s (int as returned by
self._obs2num
) – The id for the state/observation.  a (int as returned by
self._action2num
) – The id for the action taken from state s.
Returns: outcome_probs (list of (state, prob) tuples) – A list of each possible outcome and its associated probability under the model.
 s (int as returned by

state_action_pairs_leading_to_outcome
(outcome)[source]¶ Return all (state, action) pairs that have a nonzero probability of producing outcome under the current model.
Parameters: outcome (int) – The outcome state. Returns: pairs (list of (state, action) tuples) – A list of all (state, action) pairs with a nonzero probability of producing outcome under the model.


numpy_ml.rl_models.rl_utils.
tile_state_space
(env, env_stats, n_tilings, obs_max=None, obs_min=None, state_action=False, grid_size=(4, 4))[source]¶ Return a function to encode the continous observations generated by env in terms of a collection of n_tilings overlapping tilings (each with dimension grid_size) of the state space.
Parameters:  env (
gym.wrappers.time_limit.TimeLimit
instance) – An openAI environment.  n_tilings (int) – The number of overlapping tilings to use. Should be a power of 2. This determines the dimension of the discretized tileencoded state vector.
 obs_max (float or np.ndarray) – The value to treat as the max value of the observation space when
calculating the grid widths. If None, use
env.observation_space.high
. Default is None.  obs_min (float or np.ndarray) – The value to treat as the min value of the observation space when
calculating the grid widths. If None, use
env.observation_space.low
. Default is None.  state_action (bool) – Whether to use tile coding to encode stateaction values (True) or just state values (False). Default is False.
 grid_size (list of length 2) – A list of ints representing the coarseness of the tilings. E.g., a grid_size of [4, 4] would mean each tiling consisted of a 4x4 tile grid. Default is [4, 4].
Returns:  encode_obs_as_tile (function) – A function which takes as input continous observation vector and returns a set of the indices of the active tiles in the tile coded observation space.
 n_states (int) – An integer reflecting the total number of unique states possible under this tile coding regimen.
 env (

numpy_ml.rl_models.rl_utils.
get_gym_stats
()[source]¶ Return a pandas DataFrame of the environment IDs.

numpy_ml.rl_models.rl_utils.
is_tuple
(env)[source]¶ Check if the action and observation spaces for env are instances of
gym.spaces.Tuple
orgym.spaces.Dict
.Notes
A tuple space is a tuple of several (possibly multidimensional) action/observation spaces. For our purposes, a tuple space is necessarily multidimensional.
Returns:  tuple_action (bool) – Whether the env’s action space is an instance of
gym.spaces.Tuple
orgym.spaces.Dict
.  tuple_obs (bool) – Whether the env’s observation space is an instance of
gym.spaces.Tuple
orgym.spaces.Dict
.
 tuple_action (bool) – Whether the env’s action space is an instance of

numpy_ml.rl_models.rl_utils.
is_multidimensional
(env)[source]¶ Check if the action and observation spaces for env are multidimensional or
Tuple
spaces.Notes
A multidimensional space is any space whose actions / observations have more than one element in them. This includes
Tuple
spaces, but also includes single action/observation spaces with several dimensions.Parameters: env ( gym.wrappers
orgym.envs
instance) – The environment to evaluate.Returns:  md_action (bool) – Whether the env’s action space is multidimensional.
 md_obs (bool) – Whether the env’s observation space is multidimensional.
 tuple_action (bool) – Whether the env’s action space is a
Tuple
instance.  tuple_obs (bool) – Whether the env’s observation space is a
Tuple
instance.

numpy_ml.rl_models.rl_utils.
is_continuous
(env, tuple_action, tuple_obs)[source]¶ Check if an env’s observation and action spaces are continuous.
Parameters: Returns:  cont_action (bool) – Whether the env’s action space is continuous.
 cont_obs (bool) – Whether the env’s observation space is continuous.

numpy_ml.rl_models.rl_utils.
action_stats
(env, md_action, cont_action)[source]¶ Get information on env’s action space.
Parameters: Returns:  n_actions_per_dim (list of length (action_dim,)) – The number of possible actions for each dimension of the action space.
 action_ids (list or None) – A list of all valid actions within the space. If cont_action is True, this value will be None.
 action_dim (int or None) – The number of dimensions in a single action.

numpy_ml.rl_models.rl_utils.
obs_stats
(env, md_obs, cont_obs)[source]¶ Get information on the observation space for env.
Parameters: Returns:  n_obs_per_dim (list of length (obs_dim,)) – The number of possible observation classes for each dimension of the observation space.
 obs_ids (list or None) – A list of all valid observations within the space. If cont_obs is True, this value will be None.
 obs_dim (int or None) – The number of dimensions in a single observation.