GRUComposition¶
Contents¶
GRUComposition_Examples
Overview¶
The GRUComposition a subclass of AutodiffComposition that implements a single-layered gated recurrent network,
which uses a set of GatingMechanisms to implement gates that modulate the flow of information
through its hidden_layer_node. This implements the exact same computations as
a PyTorch GRU module, which is used to implement
it when its learn method is called. When it is executed in Python mode, it functions in the same way as a GRUCell module, processing its input one stimulus
at a time. However, when used for learning, it is executed as a PyTorch GRU module, so that it can used to process an entire
sequence of stimuli at once, and learn to predict the next stimulus in the sequence.
Creation¶
An GRUComposition is created by calling its constructor. When its learn
method is called, it automatically creates a PytorchGRUCompositionWrapper that implements the GRUComposition
using the PyTorch GRU module, that is trained
using PyTorch. Its constructor takes the following arguments that are in addition to or handled differently
than AutodiffComposition:
Configuration
input_size (int) specifies the length of the input array to the GRUComposition, and the size
of the input_node, which can be different than hidden_size.
hidden_size (int) specifies the length of the internal (“hidden”) state of the GRUComposition,
and the size of the hidden_layer_node and all nodes other
than the input_node, which can be different than input_size.
bias (bool) specifies whether the GRUComposition includes BIAS Nodes
and, correspondingly, the GRU module uses
bias vectors in its computations.
Learning Arguments
enable_learning (bool) specifies whether learning is enabled for the GRUComposition; if it is false,
no learning will occur, even when its learn method is called, and learn_rates
are specified.
learning_rate (float, bool, dict or None): specifies the learning_rate for the parameters of the Pytorch
GRU module that are not specified in the
learning_rate argument of a call to the learn method of the GRUComposition or the
AutodiffComposition within which the GRUComposition is nested (see Learning Rates for details
of specification). It can be assigned any of the following values (see eeComposition_Learning_Rate_Specification
for additional details of specification):
int or float: the value is used as the default learning_rate for the GRUComposition, which is assigned to any parameters that have not been otherwise specified in the learning_rate argument of a call to the
learnmethod of the GRUComposition or the AutodiffComposition within which it is nested.True or None: the GRUComposition’s default
learning_rate(.001) is used as the learning_rate for amy parameters that have not been otherwise specified in the learning_rate argument of a call to thelearnmethod of the GRUComposition or the AutodiffComposition within which it is nested.False: learning occurs only for parameters for which an explicit learning_rate has been specified in the learning_rate argument of a call to the
learnmethod of the GRUComposition or the AutodiffComposition within which it is nested.
dict: {Projection or Projection name: learning_rate}; used to specify parameter-specific learning rates, which supercede the value of the GRUComposition’s
learning_rate. Keys of the dict must be one of the keys below that reference parameters of the GRU module; values specify their learning_rates (see Learning Rates for additional information):
INPUT_TO_HIDDEN: learning rate for the
weight_ih_l0parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from theinput_nodeof the GRUComposition:wts_in,wts_iu, andwts_ir; its value is stored in thew_ih_learning_rateattribute of the GRUComposition;HIDDEN_TO_HIDDEN: learning rate for the
weight_hh_l0parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from thehidden_layer_nodeof the GRUComposition:wts_hn,wts_hu,wts_hr; its value is stored in thew_hh_learning_rateattribute of the GRUComposition;BIAS_INPUT_TO_HIDDEN: learning rate for the
bias_ih_l0parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from theinput_nodeof the GRUComposition:bias_ir,bias_iu,bias_in; its value is stored in theb_ih_learning_rateattribute of the GRUComposition;BIAS_HIDDEN_TO_HIDDEN: learning rate for the
bias_hh_l0parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from thehidden_layer_nodeof the GRUComposition:bias_hr,bias_hu,bias_hn; its value is stored in theb_hh_learning_rateattribute of theGRUComposition.
DEFAULT_LEARNING_RATE: specifies the default learning rate for the GRUComposition, that is used as the the learning_rate for any parameters for which there are not other entries in the dict.
Warning
Only the keywords above can be used to specify the learning_rate for parameters in a learning_rate dict. The learning_rates for individual Projections in the GRUComposition cannot be specified, as they do not have corresponding torch.nn.Parameters in the
named_parameters()list of the PyTorch GRU module; specifying them will raise an error.
Structure¶
The GRUComposition assigns a node to each of the computations of the PyTorch GRU module, and a Projetion to each of its weight and bias parameters, as shown in the figure below:
Structure of a GRUComposition – can be seen in more detail using the Composition’s s show_graph method with its show_node_structure argument set to True or ALL;
can also be seen with biases added by setting the show_bias argument to True in the constructor.¶
The input_node receives the input to the GRUComposition, and passes it to the
hidden_layer_node, that implements the recurrence and integration function of
a GRU. The reset_node gates the input to the new_node. The
update_node gates the input to the hidden_layer_node
from the new_node (current input) and the prior state of the hidden_layer_node (i.e., the input it receives from its recurrent Projection). The output_node receives the output of current state of the hidden_layer_node that is provided as the output of the GRUComposition. The reset_gate and update_node are GatingMechanisms,
while the other nodes are all Processing Mechanisms.
Note
The GRUComposition is limited to a single layer GRU at present, thus its num_layers argument is not
implemented. Similarly, dropout and bidirectional arguments are not yet implemented. These will
be added in a future version.
Execution¶
Processing¶
The GRUComposition implements the following computations by its reset, update, new, and hidden_layer
Nodes when it is executed:
reset(t) =Logistic[(wts_ir*input) +bias_ir+ (wts_hr*hidden_layer(t-1)) +bias_hr)]
update(t) =Logistic[(wts_iu*input) +bias_iu+ (wts_hu*hidden_layer(t-1)) +bias_hu]
new(t) = \(tanh\)[(wts_in*input) +bias_in+ (reset(t) * (wts_hn*hidden_layer(t-1) +bias_hn)]
hidden_layer(t) = [(1 -update(t)) *new(t)] + [update(t) *hidden_layer(t-1)]
This corresponds to the computations of the GRU module:
where \(\cdot\) is the dot product, \(\odot\) is the Hadamard product, and all values are for the current execution of the Composition (t) except for hidden, which uses the value from the prior execution (t-1) (see Cycles for handling of recurrence and cycles).
The full Composition is executed when its run method is
called with execution_mode set to ExecutionMode.Python, or if torch_available is False. Otherwise, and
always in a call to learn, the GRUComposition is executed using the PyTorch GRU module with values of the individual
computations copied back to Nodes of the full GRUComposition at times determined by the value of the
synch_node_values_with_torch option.
Learning¶
Learning is executed using the learn method in same way as a standard AutodiffComposition. For learning to
occur the following conditions must obtain:
enable_learningmust be set toTrue(the default);GRUCompositions’s
learning_ratemust not be False and/or the learning_rate of individual parameters must not all be False;execution_mode argument of the
learnmethod mustExecutionMode.PyTorch(the default).Note
Because a GRUComposition uses the PyTorch GRU module to implement its computations during learning, its
learnmethod requires executionExecutionMode.PyTorch; this is therefore used by default; an error occurs if any other execution_mode is specified in thelearnmethod.
The GRUComposition uses the PyTorch GRU module
to implement its computations during learning. After learning, the values of the module’s parameters are copied
to the weight matrices of the corresponding MappingProjections,
and results of computations are copied to the values of the corresponding Nodes in the GRUComposition at times determined by the value of the synch_node_values_with_torch option.
Class Reference¶
- class psyneulink.library.compositions.grucomposition.grucomposition.GRUComposition(name="GRU_Composition" input_size=1, hidden_size=1, bias=False enable_learning=True learning_rate=.001 optimizer_params=None)¶
Subclass of AutodiffComposition that implements a single-layered gated recurrent network.
See Structure and technical_note under under Execution for a description of when the full Composition is constructed and used for execution vs. when the PyTorch GRU module is used.
- Note: all exposed methods, attributes and
Parameters) of the GRUComposition are PsyNeuLink elements; all PyTorch-specific elements belong to
pytorch_representationwhich, for a GRUComposition, is of classPytorchGRUCompositionWrapper.
Constructor takes the following arguments in addition to those of AutodiffComposition:
- Parameters:
input_size (int : default 1) – specifies the length of the input array to the GRUComposition, and the size of the
input_node.hidden_size (int : default 1) – specifies the length of the internal state of the GRUComposition, and the size of the
hidden_layer_nodeand all nodes other than theinput_node.bias (bool : default False) – specifies whether the GRUComposition uses bias vectors in its computations.
enable_learning (bool : default True) – specifies whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).
learning_rate (float : default .001) – specifies the learning_rate for the GRUComposition (see Learning Arguments for additional details).
optimizer_params (Dict[str: value]) – specifies parameters for the optimizer used for learning by the GRUComposition (see Learning Arguments for details of specification).
- input_size¶
determines the length of the input array to the GRUComposition and size of the
input_node.- Type:
int
determines the size of the
hidden_layer_nodeand all otherINTERNALNodes of the GRUComposition.- Type:
int
- bias¶
determines whether the GRUComposition uses bias vectors in its computations.
- Type:
bool
- enable_learning¶
determines whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).
- Type:
bool
- learning_rate¶
determines the default learning_rate for the parameters of the Pytorch GRU module that are not specified for individual parameters in the optimizer_params argument of the AutodiffComposition’s constructor in the call to its
learnmethod (see Learning Arguments for additional details).- Type:
float, int, bool, or None
- w_ih_learning_rate¶
determines the learning rate specifically for the weights of the
efferent projectionsfrom theinput_nodeof the GRUComposition:wts_in,wts_iu, andwts_ir; corresponds to theweight_ih_l0parameter of the PyTorch GRU module (see Learning Arguments for additional details).- Type:
flot or bool
- w_hh_learning_rate¶
determines the learning rate specifically for the weights of the
efferent projectionsfrom thehidden_layer_nodeof the GRUComposition:wts_hn,wts_hu,wts_hr; corresponds to theweight_hh_l0parameter of the PyTorch GRU module(see Learning Arguments for additional details).
- Type:
float or bool
- b_ih_learning_rate¶
determines the learning rate specifically for the biases influencing the
efferent projectionsfrom theinput_nodeof the GRUComposition:bias_ir,bias_iu,bias_in; corresponds to thebias_ih_l0parameter of the PyTorch GRU module (see `Learning Arguments for additional details).- Type:
float or bool
- b_hh_learning_rate¶
determines the learning rate specifically for the biases influencing the
efferent projectionsfrom thehidden_layer_nodeof the GRUComposition:bias_hr,bias_hu,bias_hn; corresponds to thebias_hh_l0parameter of the PyTorch GRU module (see Learning Arguments for additional details).- Type:
float or bool
- input_node¶
INPUTNode that receives the input to the GRUComposition and passes it to thehidden_layer_node; corresponds to input (i) of the PyTorch GRU module.- Type:
- new_node¶
ProcessingMechanism that provides the
hidden_layer_nodewith the input from theinput_node, gated by thereset_node; corresponds to new gate (n) of the PyTorch GRU module.- Type:
ProcessingMechanism that implements the recurrent layer of the GRUComposition; corresponds to hidden layer (h) of the PyTorch GRU module.
- Type:
- reset_node¶
Gating Mechanism that that gates the input to the
new_node; corresponds to reset gate (r) of the PyTorch GRU module.- Type:
- update_node¶
Gating Mechanism that gates the inputs to the hidden layer from the
new_nodeand the prior state of thehidden_layer_nodeitself (i.e., the input it receives from its recurrent Projection); corresponds to update gate (z) of the PyTorch GRU module.- Type:
- output_node¶
OUTPUTNode that receives the output of thehidden_layer_node; corresponds to result of the PyTorch GRU module.- Type:
- learnable_projections¶
list of the MappingProjections in the GRUComposition that have
matrixparameters that can be learned; these correspond to the learnable parameters of the PyTorch GRU module.- Type:
List[MappingProjection]
- wts_in¶
MappingProjection with learnable
matrix(“connection weights”) that projects from theinput_nodeto thenew_node; corresponds to \(W_{in}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_iu¶
MappingProjection with learnable
matrix(“connection weights”) that projects from theinput_nodeto theupdate_node; corresponds to \(W_{iz}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_ir¶
MappingProjection with learnable
matrix(“connection weights”) that projects from theinput_nodeto thereset_node; corresponds to \(W_{ir}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_nh¶
MappingProjection with learnable
matrix(“connection weights”) that projects from thenew_nodeto thehidden_layer_node. (see Structure for additional information).- Type:
- wts_hr¶
MappingProjection with learnable
matrix(“connection weights”) that projects from thehidden_layer_nodeto thereset_node; corresponds to \(W_{hr}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_hu¶
MappingProjection with learnable
matrix(“connection weights”) that projects from thehidden_layer_nodeto theupdate_node; corresponds to \(W_{hz}\) in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_hn¶
MappingProjection with learnable
matrix(“connection weights”) that projects from thehidden_layer_nodeto thenew_node; corresponds to \(W_{hn}\) in the PyTorch GRU module’s computation (see Structure for additional information).- Type:
- wts_hh¶
MappingProjection with fixed
matrix(“connection weights”) that projects from thehidden_layer_nodeto itself (i.e., the recurrent Projection). (see Structure for additional information).- Type:
- wts_ho¶
MappingProjection with fixed
matrix(“connection weights”) that projects from thehidden_layer_nodeto theoutput_node. (see Structure for additional information).- Type:
- reset_gate¶
GatingProjection that gates the input to the
new_nodefrom theinput_node; itsvalueis used in the Hadamard product with the input to produce the new (external) input to thehidden_layer_node. (see Structure for additional information).- Type:
- new_gate¶
GatingProjection that gates the input to the
hidden_layer_nodefrom thenew_node; itsvalueis used in the Hadamard product with the (external) input to thehidden_layer_nodefrom thenew_node, which determines how much of thehidden_layer_node's new state is determined by the external input vs. its prior state (see Structure for additional information).- Type:
- recurrent_gate¶
GatingProjection that gates the input to the
hidden_layer_nodefrom its recurrent projection (wts_hh); itsvalueis used in the in the Hadamard product with the recurrent input to thehidden_layer_node, which determines how much of thehidden_layer_node's new state is determined by its prior state vs.its external input (see Structure for additional information).- Type:
- bias_ir_node¶
BIASNode, the Projection from which (bias_ir) provides the the bias to weights (wts_ir) from theinput_nodeto thereset_node(see Structure for additional information).- Type:
- bias_iu_node¶
BIASNode, the Projection from which (bias_iu) provides the the bias to weights (wts_iu) from theinput_nodeto theupdate_node(see Structure for additional information).- Type:
- bias_in_node¶
BIASNode, the Projection from which (bias_in) provides the the bias to weights (wts_in) from theinput_nodeto thenew_node(see Structure for additional information).- Type:
- bias_hr_node¶
BIASNode, the Projection from which (bias_hr) provides the the bias to weights (wts_hr) from thehidden_layer_nodeto thereset_node(see Structure for additional information).- Type:
- bias_hu_node¶
BIASNode, the Projection from which (bias_hu) provides the the bias to weights (wts_hu) from thehidden_layer_nodeto theupdate_node(see Structure for additional information).- Type:
- bias_hn_node¶
BIASNode, the Projection from which (bias_hn) provides the the bias to weights (wts_hn) from thehidden_layer_nodeto thenew_node(see Structure for additional information).- Type:
- biases¶
list of the MappingProjections from the
BIASNodes of the GRUComposition, all of which have `matrixparameters ifbiasis True; these correspond to the learnable biases of the PyTorch GRU module (see Structure for additional information).- Type:
List[MappingProjection]
- bias_ir¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_ir, from theinput_nodeto thereset_node; corresponds to the \(b_ir\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- bias_iu¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_iu, from theinput_nodeto theupdate_node; corresponds to the \(b_iz\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- bias_in¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_in, from theinput_nodeto thenew_node; corresponds to the \(b_in\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- bias_hr¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_hr, from thehidden_layer_nodeto thereset_node; corresponds to the \(b_hr\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- bias_hu¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_hu, from thehidden_layer_nodeto theupdate_node; corresponds to the \(b_hz\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- bias_hn¶
MappingProjection with learnable
matrix(“connection weights”) that provides the bias to the weights,wts_hn, from thehidden_layer_nodeto thenew_node; corresponds to the \(b_hn\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type:
- _node_roles_manager_type¶
alias of
GRUNodeRolesManager
- class PytorchGRUCompositionWrapper(composition, device, outer_creator=None, dtype=None, subclass_components=None, context=None, base_context=<psyneulink.core.globals.context.Context object>)¶
Wrapper for GRUComposition as a Pytorch Module Manage the exchange of the Composition’s Projection Matrices and the Pytorch GRU Module’s parameters, and return its output value.
- _flatten_for_pytorch(pnl_proj, sndr_mech, rcvr_mech, nested_port, nested_mech, outer_comp, outer_comp_pytorch_rep, access, context, base_context=<psyneulink.core.globals.context.Context object>)¶
Return PytorchProjectionWrappers for Projections to/from GRUComposition to nested Composition Replace GRUComposition’s nodes with gru_mech and projections to and from it.
- Return type:
Tuple
- _get_processing_graph(context)¶
Override to use ‘PYTORCH GRU NODE’ instead of PNL nodes for PytorchShowGraph of standalone GRUComposition
- _get_roles_by_node(node)¶
Override to return NodeRole for ‘PYTORCH GRU NODE’
- Return type:
list
- _instantiate_GRU_pytorch_mechanism_wrappers(gru_comp, outer_creator, device, context)¶
Instantiate PytorchMechanismWrapper for GRU Node
- _instantiate_GRU_pytorch_projection_wrappers(torch_gru, device, context)¶
Create PytorchGRUProjectionWrappers for each learnable Projection of GRUComposition For each PytorchGRUProjectionWrapper, assign the current weight matrix of the PNL Projection to the corresponding part of the tensor in the parameter of the Pytorch GRU module.
- _torch_params_to_projections(param_groups)¶
Return dict of {torch parameter: Projection} for all wrapped Projections
- Return type:
dict
- forward(inputs, optimization_num, synch_with_pnl_options, retain_in_pnl_options, full_sequence_mode, sequence_lengths, context=None)¶
Forward method of the model for PyTorch modes
This is called only when GRUComposition is run as a standalone Composition. Otherwise, the node.execute() method is called directly (i.e., it is treated as a single node). Returns a dictionary {output_node:value} with the output value for the torch GRU module (that is used by the collect_afferents method(s) of the other node(s) that receive Projections from the GRUComposition.
- Return type:
dict
- get_parameters_from_torch_gru()¶
- Return type:
Tuple[Tensor]
Get parameters from PyTorch GRU module corresponding to GRUComposition’s Projections. Format tensors:
transpose all weight and bias tensors;
reformat biases as 2d
- Return formatted tensors, which are used:
in set_weights_from_torch_gru(), where they are converted to numpy arrays
for forward computation in pytorchGRUwrappers._copy_pytorch_node_outputs_to_pnl_values()
- class PytorchGRUMechanismWrapper(mechanism, gru_composition, component_idx, outer_creator=None, use=None, dtype=None, device=None, context=None)¶
Wrapper for Pytorch GRU Node Handling of hidden_state: uses GRUComposition’s HIDDEN_NODE.value to cache state of hidden layer: - gets input to function for hidden state from GRUComposition’s HIDDEN_NODE.value - sets GRUComposition’s HIDDEN_NODE.value to return value for hidden state
- _calculate_torch_gru_internal_state_values(input, hidden_state)¶
Manually calculate and store internal state values for torch GRU prior to backward pass These are needed for assigning to the corresponding nodes in the GRUComposition. Returns r_t, z_t, n_t, h_t current reset, update, new, hidden and state values, respectively
- Return type:
dict
- collect_afferents(batch_size, port=None, inputs=None)¶
Return afferent projections for input_port(s) of the Mechanism If there is only one input_port, return the sum of its afferents (for those in Composition) If there are multiple input_ports, return a tensor (or list of tensors if input ports are ragged) of shape:
(batch, input_port, projection, …)
Where the ellipsis represent 1 or more dimensions for the values of the projected afferent.
- Return type:
Tensor
- set_pnl_variable_and_values(set_variable=False, set_value=True, context=None)¶
Set the state of the PytorchMechanismWrapper’s Mechanism Note: if execute_mech=True requires that variable=True
- pytorch_composition_wrapper_type¶
alias of
PytorchGRUCompositionWrapper
- pytorch_mechanism_wrapper_type¶
alias of
PytorchGRUMechanismWrapper
- _construct_pnl_composition(input_size, hidden_size, context)¶
Construct Nodes and Projections for GRUComposition
- _is_in_composition(component, nested=True)¶
Return True if component is in Composition, including any nested Compositions if nested is True Include input_CIM and output_CIM for self and all nested Compositions
- set_weights(weights, biases, context=None)¶
Set weights for Projections to input_node and hidden_layer_node.
- infer_backpropagation_learning_pathways(execution_mode, context=None, base_context=None)¶
Override to construct only TARGET_MECHANISM and LossMechanism for GRUComposition. Return a list containing TARGET_MECHANISM, that needs to be referenced in inputs argument of learn()
- Return type:
list
- _get_pytorch_backprop_pathway(input_node, context)¶
Breadth-first search from input_node to find all input -> <any OUTPUT Node> pathways Uses queue((node, afferent Projection, composition) to traverse all nodes in the graph IMPLEMENTATION NOTE: flattens nested Compositions, removing any CIMs in the nested Compositions Return a list of all pathways from input_node -> any OUTPUT Node
- Return type:
list
- add_node(node, required_roles=None, context=None)¶
Override if called from command line to disallow modification of GRUComposition
- add_projection(*args, **kwargs)¶
Override if called from command line to disallow modification of GRUComposition
- _add_dependency(afferent_proj, sender, projection, receiver, dependency_dict, queue, comp)¶
Override to implement direct pathway through gru_mech for pytorch backprop pathway. Add direct_proj_in and direct_proj_out to self._pytorch_projections Other projections (including ‘INPUT TO UPDATE WEIGHTS’) are added in super()._get_pytorch_backprop_pathway()
- _get_execution_mode(execution_mode)¶
Parse execution_mode argument and return a valid execution mode for the learn() method
- _identify_output_nodes(context)¶
Recursively call all nested AutodiffCompositions to assign TARGET_MECHANISMs for learning
- _handle_illegal_sample_target_specs_from_learn(specs)¶
Override to remove
- compute_loss(targets, pytorch_rep, context)¶
Override to directly compute loss Invoked when GRUComposition is run as a standalone Composition, in which case:
TARGET_MECHANISM is constructed (and included in GRUComposition) to accept targets specified in learn()
LossMechanism is not constructed
loss is computed directly using torch loss function specified by self.loss_spec
- property num_learnable_pathways¶
Override to return just 1, Since there is only one pathway through which learning can occur in GRUComposition: the direct pathway through the GRU mechanism (i.e., self.gru_mech), and no explicilty learnable Projections (since that occurs within the Pytorch module itself
- Note: all exposed methods, attributes and
- exception psyneulink.library.compositions.grucomposition.grucomposition.GRUCompositionError(message, component=None)¶