GRUComposition¶

Contents¶

Overview

Creation

Configuration

Learning Arguments

Structure

Execution

Processing

Learning

GRUComposition_Examples

Class Reference

Overview¶

The GRUComposition a subclass of AutodiffComposition that implements a single-layered gated recurrent network, which uses a set of GatingMechanisms to implement gates that modulate the flow of information through its hidden_layer_node. This implements the exact same computations as a PyTorch GRU module, which is used to implement it when its learn method is called. When it is executed in Python model, it functions in the same way as a GRUCell module, processing its input one stimulus at a time. However, when used for learning, it is executed as a PyTorch GRU module, so that it can used to process an entire sequence of stimuli at once, and learn to predict the next stimulus in the sequence.

Creation¶

An GRUComposition is created by calling its constructor. When it’s learn method is called, it automatically creates a PytorchGRUCompositionWrapper that implements the GRUComposition using the PyTorch GRU module, that is trained using PyTorch. Its constructor takes the following arguments that are in addition to or handled differently than AutodiffComposition:

Configuration

input_size (int) specifies the length of the input array to the GRUComposition, and the size of the input_node, which can be different than hidden_size.

hidden_size (int) specifies the length of the internal (“hidden”) state of the GRUComposition, and the size of the hidden_layer_node and all nodes other than the input_node, which can be different than input_size.

bias (bool) specifies whether the GRUComposition includes BIAS Nodes and, correspondingly, the GRU module uses bias vectors in its computations.

Learning Arguments

enable_learning (bool) specifies whether learning is enabled for the GRUComposition; if it is false, no learning will occur, even when its learn method is called, and learn_rates are specified.

learning_rate (float, bool, dict or None): specifies the learning_rate for the parameters of the Pytorch GRU module that are not specified in the learning_rate argument of a call to the learn method of the GRUComposition or the AutodiffComposition within which the GRUComposition is nested (see Learning Rates for details of specification). It can be assigned any of the following values (see eeComposition_Learning_Rate_Specification for additonal details of specification):

int or float: the value is used as the default learning_rate for the GRUComposition, which is assigned to any parameters that have not been otherwise specified in the learning_rate argument of a call to the learn method of the GRUComposition or the AutodiffComposition within which it is nested.

True or None: the GRUComposition’s default learning_rate (.001) is used as the learning_rate for amy parameters that have not been otherwise specified in the learning_rate argument of a call to the learn method of the GRUComposition or the AutodiffComposition within which it is nested.

False: learning occurs only for parameters for which an explicit learning_rate has been specified in the learning_rate argument of a call to the learn method of the GRUComposition or the AutodiffComposition within which it is nested.

dict: {Projection or Projection name: learning_rate}; used to specify parameter-specific learning rates, which supercede the value of the GRUCompositon’s learning_rate. Keys of the dict must be one of the keys below that reference parameters of the GRU module; values specify their learning_rates (see Learning Rates for additional information):

INPUT_TO_HIDDEN: learning rate for the weight_ih_l0 parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from the input_node of the GRUComposition: wts_in, wts_iu, and wts_ir; its value is stored in the w_ih_learning_rate attribute of the GRUComposition;

HIDDEN_TO_HIDDEN: learning rate for the weight_hh_l0 parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from the hidden_layer_node of the GRUComposition: wts_hn, wts_hu, wts_hr; its value is stored in the w_hh_learning_rate attribute of the GRUComposition;

BIAS_INPUT_TO_HIDDEN: learning rate for the bias_ih_l0 parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from the input_node of the GRUComposition: bias_ir, bias_iu, bias_in; its value is stored in the b_ih_learning_rate attribute of the GRUComposition;

BIAS_HIDDEN_TO_HIDDEN: learning rate for the bias_hh_l0 parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from the hidden_layer_node of the GRUComposition: bias_hr, bias_hu, bias_hn; its value is stored in the b_hh_learning_rate attribute of theGRUComposition.

DEFAULT_LEARNING_RATE: specifies the default learning rate for the GRUComposition, that is used as the the learning_rate for any parameters for which there are not other entries in the dict.

Warning

Only the keywords above can be used to specify the learning_rate for parameters in a learning_rate dict. The learning_rates for individual Projections in the GRUComposition cannot be specified, as they do not have corresponding torch.nn.Parameters in the named_parameters() list of the PyTorch GRU module; specifying them will raise an error.

Structure¶

The GRUComposition assigns a node to each of the computations of the PyTorch GRU module, and a Projetion to each of its weight and bias parameters, as shown in the figure below:

The input_node receives the input to the GRUComposition, and passes it to the hidden_layer_node, that implements the recurrence and integration function of a GRU. The reset_node gates the input to the new_node. The update_node gates the input to the hidden_layer_node from the new_node (current input) and the prior state of the hidden_layer_node (i.e., the input it receives from its recurrent Projection). The output_node receives the output of current state of the hidden_layer_node that is provided as the output of the GRUComposition. The reset_gate and update_node are GatingMechanisms, while the other nodes are all Processing Mechanisms.

Note

The GRUComposition is limited to a single layer GRU at present, thus its num_layers argument is not implemented. Similarly, dropout and bidirectional arguments are not yet implemented. These will be added in a future version.

Execution¶

Processing¶

The GRUComposition implements the following computations by its reset, update, new, and hidden_layer Nodes when it is executed:

reset(t) = Logistic[(wts_ir * input) + bias_ir + (wts_hr * hidden_layer(t-1)) + bias_hr)]

update(t) = Logistic[(wts_iu * input) + bias_iu + (wts_hu * hidden_layer(t-1)) + bias_hu]

new(t) = \(tanh\)[(wts_in * input) + bias_in + (reset(t) * (wts_hn * hidden_layer(t-1) + bias_hn)]

hidden_layer(t) = [(1 - update(t)) * new(t)] + [update(t) * hidden_layer(t-1)]

This corresponds to the computations of the GRU module:

\[ \begin{align}\begin{aligned}&reset = Logistic(wts\_ir \cdot input + bias\_ir + wts\_hr \cdot hidden + bias\_hr)\\&update = Logistic(wts\_iu \cdot input + bias\_iu + wts\_hu \cdot hidden + bias\_hu)\\&new = Tanh(wts\_in \cdot input + bias\_in + reset \cdot (wts\_hn \cdot hidden + bias\_hn))\\&hidden = (1 - update) \odot new + update \odot hidden\end{aligned}\end{align} \]

where \(\cdot\) is the dot product, \(\odot\) is the Hadamard product, and all values are for the current execution of the Composition (t) except for hidden, which uses the value from the prior execution (t-1) (see Cycles for handling of recurrence and cycles).

The full Composition is executed when its run method is called with execution_mode set to ExecutionMode.Python, or if torch_available is False. Otherwise, and always in a call to learn, the GRUComposition is executed using the PyTorch GRU module with values of the individual computations copied back to Nodes of the full GRUComposition at times determined by the value of the synch_node_values_with_torch option.

Learning¶

Learning is executed using the learn method in same way as a standard AutodiffComposition. For learning to occur the following conditions must obtain:

enable_learning must be set to True (the default);

GRUCompositions’s learning_rate must not be False and/or the learning_rate of individual parameters must not all be False;

execution_mode argument of the learn method must ExecutionMode.PyTorch (the default).

Note

Because a GRUComposition uses the PyTorch GRU module to implement its computations during learning, its learn method requires execution ExecutionMode.PyTorch; this is therefore used by default; an error occurs if any other execution_mode is specified in the learn method.

The GRUComposition uses the PyTorch GRU module to implement its computations during learning. After learning, the values of the module’s parameters are copied to the weight matrices of the corresponding MappingProjections, and results of computations are copied to the values of the corresponding Nodes in the GRUComposition at times determined by the value of the synch_node_values_with_torch option.

Class Reference¶

class psyneulink.library.compositions.grucomposition.grucomposition.GRUComposition(name="GRU_Composition" input_size=1, hidden_size=1, bias=False enable_learning=True learning_rate=.001 optimizer_params=None)¶

Subclass of AutodiffComposition that implements a single-layered gated recurrent network.

See Structure and technical_note under under Execution for a description of when the full Composition is constructed and used for execution vs. when the PyTorch GRU module is used.

Note: all exposed methods, attributes and Parameters) of the GRUComposition are: PsyNeuLink elements; all PyTorch-specific elements belong to pytorch_representation which, for a GRUComposition, is of class PytorchGRUCompositionWrapper.

Constructor takes the following arguments in addition to those of AutodiffComposition:

Parameters:

input_size (int : default 1) – specifies the length of the input array to the GRUComposition, and the size of the input_node.
hidden_size (int : default 1) – specifies the length of the internal state of the GRUComposition, and the size of the hidden_layer_node and all nodes other than the input_node.
bias (bool : default False) – specifies whether the GRUComposition uses bias vectors in its computations.
enable_learning (bool : default True) – specifies whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).
learning_rate (float : default .001) – specifies the learning_rate for the GRUComposition (see Learning Arguments for additional details).
optimizer_params (Dict[str: value]) – specifies parameters for the optimizer used for learning by the GRUComposition (see Learning Arguments for details of specification).

input_size¶

determines the length of the input array to the GRUComposition and size of the input_node.

Type:: int

hidden_size¶

determines the size of the hidden_layer_node and all other INTERNAL Nodes of the GRUComposition.

Type:: int

bias¶

determines whether the GRUComposition uses bias vectors in its computations.

Type:: bool

enable_learning¶

determines whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).

Type:: bool

learning_rate¶

determines the default learning_rate for the parameters of the Pytorch GRU module that are not specified for individual parameters in the optimizer_params argument of the AutodiffComposition’s constructor in the call to its learn method (see Learning Arguments for additional details).

Type:: float, int, bool or None

w_ih_learning_rate¶

determines the learning rate specifically for the weights of the efferent projections from the input_node of the GRUComposition: wts_in, wts_iu, and wts_ir; corresponds to the weight_ih_l0 parameter of the PyTorch GRU module (see Learning Arguments for additional details).

Type:: flot or bool

w_hh_learning_rate¶

determines the learning rate specifically for the weights of the efferent projections from the hidden_layer_node of the GRUComposition: wts_hn, wts_hu, wts_hr; corresponds to the weight_hh_l0 parameter of the PyTorch GRU module

(see Learning Arguments for additional details).

Type:: float or bool

b_ih_learning_rate¶

determines the learning rate specifically for the biases influencing the efferent projections from the input_node of the GRUComposition: bias_ir, bias_iu, bias_in; corresponds to the bias_ih_l0 parameter of the PyTorch GRU module (see `Learning Arguments for additional details).

Type:: float or bool

b_hh_learning_rate¶

determines the learning rate specifically for the biases influencing the efferent projections from the hidden_layer_node of the GRUComposition: bias_hr, bias_hu, bias_hn; corresponds to the bias_hh_l0 parameter of the PyTorch GRU module (see Learning Arguments for additional details).

Type:: float or bool

input_node¶

INPUT Node that receives the input to the GRUComposition and passes it to the hidden_layer_node; corresponds to input (i) of the PyTorch GRU module.

Type:: ProcessingMechanism

new_node¶

ProcessingMechanism that provides the hidden_layer_node with the input from the input_node, gated by the reset_node; corresponds to new gate (n) of the PyTorch GRU module.

Type:: ProcessingMechanism

hidden_layer_node¶

ProcessingMechanism that implements the recurrent layer of the GRUComposition; corresponds to hidden layer (h) of the PyTorch GRU module.

Type:: ProcessingMechanism

reset_node¶

Gating Mechanism that that gates the input to the new_node; corresponds to reset gate (r) of the PyTorch GRU module.

Type:: GatingMechanism

update_node¶

Gating Mechanism that gates the inputs to the hidden layer from the new_node and the prior state of the hidden_layer_node itself (i.e., the input it receives from its recurrent Projection); corresponds to update gate (z) of the PyTorch GRU module.

Type:: GatingMechanism

output_node¶

OUTPUT Node that receives the output of the hidden_layer_node; corresponds to result of the PyTorch GRU module.

Type:: ProcessingMechanism

learnable_projections¶

list of the MappingProjections in the GRUComposition that have matrix parameters that can be learned; these correspond to the learnable parameters of the PyTorch GRU module.

Type:: List[MappingProjection]

wts_in¶

MappingProjection with learnable matrix (“connection weights”) that projects from the input_node to the new_node; corresponds to \(W_{in}\) term in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_iu¶

MappingProjection with learnable matrix (“connection weights”) that projects from the input_node to the update_node; corresponds to \(W_{iz}\) term in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_ir¶

MappingProjection with learnable matrix (“connection weights”) that projects from the input_node to the reset_node; corresponds to \(W_{ir}\) term in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_nh¶

MappingProjection with learnable matrix (“connection weights”) that projects from the new_node to the hidden_layer_node. (see Structure for additional information).

Type:: MappingProjection

wts_hr¶

MappingProjection with learnable matrix (“connection weights”) that projects from the hidden_layer_node to the reset_node; corresponds to \(W_{hr}\) term in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_hu¶

MappingProjection with learnable matrix (“connection weights”) that projects from the hidden_layer_node to the update_node; corresponds to \(W_{hz}\) in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_hn¶

MappingProjection with learnable matrix (“connection weights”) that projects from the hidden_layer_node to the new_node; corresponds to \(W_{hn}\) in the PyTorch GRU module’s computation (see Structure for additional information).

Type:: MappingProjection

wts_hh¶

MappingProjection with fixed matrix (“connection weights”) that projects from the hidden_layer_node to itself (i.e., the recurrent Projection). (see Structure for additional information).

Type:: MappingProjection

wts_ho¶

MappingProjection with fixed matrix (“connection weights”) that projects from the hidden_layer_node to the output_node. (see Structure for additional information).

Type:: MappingProjection

reset_gate¶

GatingProjection that gates the input to the new_node from the input_node; its value is used in the Hadamard product with the input to produce the new (external) input to the hidden_layer_node. (see Structure for additional information).

Type:: GatingProjection

new_gate¶

GatingProjection that gates the input to the hidden_layer_node from the new_node; its value is used in the Hadamard product with the (external) input to the hidden_layer_node from the new_node, which determines how much of the hidden_layer_node's new state is determined by the external input vs. its prior state (see Structure for additional information).

Type:: GatingProjection

recurrent_gate¶

GatingProjection that gates the input to the hidden_layer_node from its recurrent projection (wts_hh); its value is used in the in the Hadamard product with the recurrent input to the hidden_layer_node, which determines how much of the hidden_layer_node's new state is determined by its prior state vs.its external input (see Structure for additional information).

Type:: GatingProjection

bias_ir_node¶

BIAS Node, the Projection from which (bias_ir) provides the the bias to weights (wts_ir) from the input_node to the reset_node (see Structure for additional information).

Type:: ProcessingMechanism

bias_iu_node¶

BIAS Node, the Projection from which (bias_iu) provides the the bias to weights (wts_iu) from the input_node to the update_node (see Structure for additional information).

Type:: ProcessingMechanism

bias_in_node¶

BIAS Node, the Projection from which (bias_in) provides the the bias to weights (wts_in) from the input_node to the new_node (see Structure for additional information).

Type:: ProcessingMechanism

bias_hr_node¶

BIAS Node, the Projection from which (bias_hr) provides the the bias to weights (wts_hr) from the hidden_layer_node to the reset_node (see Structure for additional information).

Type:: ProcessingMechanism

bias_hu_node¶

BIAS Node, the Projection from which (bias_hu) provides the the bias to weights (wts_hu) from the hidden_layer_node to the update_node (see Structure for additional information).

Type:: ProcessingMechanism

bias_hn_node¶

BIAS Node, the Projection from which (bias_hn) provides the the bias to weights (wts_hn) from the hidden_layer_node to the new_node (see Structure for additional information).

Type:: ProcessingMechanism

biases¶

list of the MappingProjections from the BIAS Nodes of the GRUComposition, all of which have `matrix parameters if bias is True; these correspond to the learnable biases of the PyTorch GRU module (see Structure for additional information).

Type:: List[MappingProjection]

bias_ir¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_ir, from the input_node to the reset_node; corresponds to the \(b_ir\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: MappingProjection

bias_iu¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_iu, from the input_node to the update_node; corresponds to the \(b_iz\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: ProcessingMechanism

bias_in¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_in, from the input_node to the new_node; corresponds to the \(b_in\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: ProcessingMechanism

bias_hr¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_hr, from the hidden_layer_node to the reset_node; corresponds to the \(b_hr\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: ProcessingMechanism

bias_hu¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_hu, from the hidden_layer_node to the update_node; corresponds to the \(b_hz\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: ProcessingMechanism

bias_hn¶

MappingProjection with learnable matrix (“connection weights”) that provides the bias to the weights, wts_hn, from the hidden_layer_node to the new_node; corresponds to the \(b_hn\) bias parameter of the PyTorch GRU module (see Structure for additional information).

Type:: ProcessingMechanism

class PytorchGRUCompositionWrapper(composition, device, outer_creator=None, dtype=None, subclass_components=None, context=None, base_context=<psyneulink.core.globals.context.Context object>)¶

Wrapper for GRUComposition as a Pytorch Module Manage the exchange of the Composition’s Projection Matrices and the Pytorch GRU Module’s parameters, and return its output value.

_flatten_for_pytorch(pnl_proj, sndr_mech, rcvr_mech, nested_port, nested_mech, outer_comp, outer_comp_pytorch_rep, access, context, base_context=<psyneulink.core.globals.context.Context object>)¶

Return PytorchProjectionWrappers for Projections to/from GRUComposition to nested Composition Replace GRUComposition’s nodes with gru_mech and projections to and from it.

Return type:: Tuple

_instantiate_GRU_pytorch_mechanism_wrappers(gru_comp, device, context)¶: Instantiate PytorchMechanismWrapper for GRU Node

_instantiate_GRU_pytorch_projection_wrappers(torch_gru, device, context)¶: Create PytorchGRUProjectionWrappers for each learnable Projection of GRUComposition For each PytorchGRUProjectionWrapper, assign the current weight matrix of the PNL Projection to the corresponding part of the tensor in the parameter of the Pytorch GRU module.

_torch_params_to_projections(param_groups)¶

Return dict of {torch parameter: Projection} for all wrapped Projections

Return type:: dict

_validate_optimizer_param_specs(optimizer_param_specs, source, context, nested=False)¶: Override to filter and raise error for individual Projections (i.e., specifications of slices)

forward(inputs, optimization_num, synch_with_pnl_options, full_sequence_mode, context=None)¶

Forward method of the model for PyTorch modes

This is called only when GRUComposition is run as a standalone Composition. Otherwise, the node.execute() method is called directly (i.e., it is treated as a single node). Returns a dictionary {output_node:value} with the output value for the torch GRU module (that is used by the collect_afferents method(s) of the other node(s) that receive Projections from the GRUComposition.

Return type:: dict

get_parameters_from_torch_gru()¶

Return type:: Tuple[Tensor]

Get parameters from PyTorch GRU module corresponding to GRUComposition’s Projections. Format tensors:

transpose all weight and bias tensors;

reformat biases as 2d

Return formatted tensors, which are used:

in set_weights_from_torch_gru(), where they are converted to numpy arrays
for forward computation in pytorchGRUwrappers._copy_pytorch_node_outputs_to_pnl_values()

class PytorchGRUMechanismWrapper(mechanism, composition, component_idx, use, dtype, device, context)¶

Wrapper for Pytorch GRU Node Handling of hidden_state: uses GRUComposition’s HIDDEN_NODE.value to cache state of hidden layer: - gets input to function for hidden state from GRUComposition’s HIDDEN_NODE.value - sets GRUComposition’s HIDDEN_NODE.value to return value for hidden state

_calculate_torch_gru_internal_state_values(input, hidden_state)¶

Manually calculate and store internal state values for torch GRU prior to backward pass These are needed for assigning to the corresponding nodes in the GRUComposition. Returns r_t, z_t, n_t, h_t current reset, update, new, hidden and state values, respectively

Return type:: dict

collect_afferents(batch_size, port=None, inputs=None)¶

Return afferent projections for input_port(s) of the Mechanism If there is only one input_port, return the sum of its afferents (for those in Composition) If there are multiple input_ports, return a tensor (or list of tensors if input ports are ragged) of shape:

(batch, input_port, projection, …)

Where the ellipsis represent 1 or more dimensions for the values of the projected afferent.

Return type:: Tensor

set_pnl_variable_and_values(set_variable=False, set_value=True, context=None)¶: Set the state of the PytorchMechanismWrapper’s Mechanism Note: if execute_mech=True requires that variable=True

pytorch_composition_wrapper_type¶: alias of PytorchGRUCompositionWrapper

pytorch_mechanism_wrapper_type¶: alias of PytorchGRUMechanismWrapper

_construct_pnl_composition(input_size, hidden_size, context)¶: Construct Nodes and Projections for GRUComposition

set_weights(weights, biases, context=None)¶: Set weights for Projections to input_node and hidden_layer_node.

infer_backpropagation_learning_pathways(execution_mode, context=None)¶

Return type:: list

Create backpropagation learning pathways for every Input Node –> Output Node pathway Flattens nested compositions:

only includes the Projections in outer Composition to/from the CIMs of the nested Composition (i.e., to input_CIMs and from output_CIMs) – the ones that should be learned;

excludes Projections from/to CIMs in the nested Composition (from input_CIMs and to output_CIMs), as those should remain identity Projections;

see PytorchCompositionWrapper for table of how Projections are handled and further details.

Returns list of target nodes for each pathway

_get_pytorch_backprop_pathway(input_node, context)¶

Breadth-first search from input_node to find all input -> output pathways Uses queue(node, composition) to traverse all nodes in the graph IMPLEMENTATION NOTE: flattens nested Compositions, removing any CIMs in the nested Compositions Return a list of all pathways from input_node -> output node

Return type:: list

_get_execution_mode(execution_mode)¶: Parse execution_mode argument and return a valid execution mode for the learn() method

_add_dependency(sender, projection, receiver, dependency_dict, queue, comp)¶: Override to implement direct pathway through gru_mech for pytorch backprop pathway. Add direct_proj_in and direct_proj_out to self._pytorch_projections Other projections (including ‘INPUT TO UPDATE WEIGHTS’) are added in super()._get_pytorch_backprop_pathway()

_identify_target_nodes(context)¶: Recursively call all nested AutodiffCompositions to assign TARGET nodes for learning

add_node(node, required_roles=None, context=None)¶: Override if called from command line to disallow modification of GRUComposition

add_projection(*args, **kwargs)¶: Override if called from command line to disallow modification of GRUComposition

exception psyneulink.library.compositions.grucomposition.grucomposition.GRUCompositionError(message, component=None)¶