GRUComposition¶
Contents¶
GRUComposition_Examples
Overview¶
The GRUComposition a subclass of AutodiffComposition that implements a single-layered gated recurrent network,
which uses a set of GatingMechanisms to implement gates that modulate the flow of information
through its hidden_layer_node
. This implements the exact same computations as
a PyTorch GRU module, which is used to implement
it when its learn
method is called. When it is executed in Python model, it functions
in the same way as a GRUCell module, processing
its input one stimulus at a time. However, when used for learning, it is executed as
a PyTorch GRU module, so that it can used to
process an entire sequence of stimuli at once, and learn to predict the next stimulus in the sequence.
Creation¶
An GRUComposition is created by calling its constructor. When it’s learn
method is called, it automatically creates a PytorchGRUCompositionWrapper that implements the GRUComposition
using the PyTorch GRU module, that is trained
using PyTorch. Its constructor takes the following arguments that are in addition to or handled differently
than AutodiffComposition:
input_size (int) specifies the length of the input array to the GRUComposition, and the size
of the input_node
, which can be different than hidden_size.
hidden_size (int) specifies the length of the internal (“hidden”) state of the GRUComposition,
and the size of the hidden_layer_node
and all nodes other
than the input_node
, which can be different than input_size.
bias (bool) specifies whether the GRUComposition includes BIAS
Nodes
and, correspondingly, the GRU module uses
bias vectors in its computations.
enable_learning (bool) specifies whether learning is enabled for the GRUComposition; if it is false,
no learning will occur, even when its learn
method is called.
learning_rate (bool or float): specifies the default learning_rate for the parameters of the Pytorch GRU module that are not specified for individual
parameters in the optimizer_params argument of the AutodiffComposition’s constructor in the call to its learn
method. If it is an int or a float, that is used as the default learning rate for the
GRUComposition; if it is None or True, the GRUComposition’s default learning_rate
(.001) is used; if it is False, then learning will occur only for parameters for which an explicit learning_rate
has been specified in the optimizer_params argument of the GRUComposition’s constructor
optimizer_params (dict): used to specify parameter-specific learning rates, which supercede the value of the
GRUCompositon’s learning_rate
. Keys of the dict must reference parameters of the
GRU module, and values their learning_rates,
as described below.
Keys for specifying individual parameters in the optimizer_params dict:
`w_ih`: learning rate for the
weight_ih_l0
parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from theinput_node
of the GRUComposition:wts_in
,wts_iu
, andwts_ir
; its value is stored in thew_ih_learning_rate
attribute of the GRUComposition;`w_hh`: learning rate for the
weight_hh_l0
parameter of the PyTorch GRU module that corresponds to the weights of the efferent projections from thehidden_layer_node
of the GRUComposition:wts_hn
,wts_hu
,wts_hr
; its value is stored in thew_hh_learning_rate
attribute of the GRUComposition;`b_ih`: learning rate for the
bias_ih_l0
parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from theinput_node
of the GRUComposition:bias_ir
,bias_iu
,bias_in
; its value is stored in theb_ih_learning_rate
attribute of the GRUComposition;`b_hh`: learning rate for the
bias_hh_l0
parameter of the PyTorch GRU module that corresponds to the biases of the efferent projections from thehidden_layer_node
of the GRUComposition:bias_hr
,bias_hu
,bias_hn
; its value is stored in theb_hh_learning_rate
attribute of theGRUComposition.Values for specifying an individual parameter’s learning_rate in the optimizer_params dict
int or float: the value is used as the learning_rate;
True or None: the value of the GRUComposition’s
learning_rate
is used;False: the parameter is not learned.
Structure¶
The GRUComposition assigns a node to each of the computations of the PyTorch GRU module, and a Projetion to each of its weight and bias parameters, as shown in the figure below:
Structure of a GRUComposition – can be seen in more detail using the Composition’s s show_graph
method with its show_node_structure argument set to True
or ALL
;
can also be seen with biases added by setting the show_bias argument to True
in the constructor.¶
The input_node
receives the input to the GRUComposition, and passes it to the
hidden_layer_node
, that implements the recurrence and integration function of
a GRU. The reset_node
gates the input to the new_node
. The
update_node
gates the input to the hidden_layer_node
from the new_node
(current input) and the prior state of the hidden_layer_node
(i.e., the input it receives from its recurrent Projection). The output_node
receives the output of current state of the hidden_layer_node
that is provided as the output of the GRUComposition. The reset_gate
and update_node
are GatingMechanisms,
while the other nodes are all Processing Mechanisms.
Note
The GRUComposition is limited to a single layer GRU at present, thus its num_layers
argument is not
implemented. Similarly, dropout
and bidirectional
arguments are not yet implemented. These will
be added in a future version.
Execution¶
Processing¶
The GRUComposition implements the following computations by its reset
, update
, new
, and hidden_layer
Nodes when it is executed:
reset
(t) =Logistic
[(wts_ir
*input
) +bias_ir
+ (wts_hr
*hidden_layer
(t-1)) +bias_hr
)]
update
(t) =Logistic
[(wts_iu
*input
) +bias_iu
+ (wts_hu
*hidden_layer
(t-1)) +bias_hu
]
new
(t) = \(tanh\)[(wts_in
*input
) +bias_in
+ (reset
(t) * (wts_hn
*hidden_layer
(t-1) +bias_hn
)]
hidden_layer
(t) = [(1 -update
(t)) *new
(t)] + [update
(t) *hidden_layer
(t-1)]
This corresponds to the computations of the GRU module:
where \(\cdot\) is the dot product, \(\odot\) is the Hadamard product, and all values are for the current execution of the Composition (t) except for hidden, which uses the value from the prior execution (t-1) (see Cycles for handling of recurrence and cycles).
The full Composition is executed when its run
method is
called with execution_mode set to ExecutionMode.Python
, or if torch_available
is False. Otherwise, and
always in a call to learn
, the GRUComposition is executed using the PyTorch GRU module with values of the individual
computations copied back to Nodes of the full GRUComposition at times determined by the value of the
synch_node_values_with_torch
option.
Learning¶
Learning is executed using the learn
method in same way as a standard AutodiffComposition. For learning to
occur the following conditions must obtain:
enable_learning
must be set toTrue
(the default);GRUCompositions’s
learning_rate
must not be False and/or the learning_rate of individual parameters must not all be False;execution_mode argument of the
learn
method mustExecutionMode.PyTorch
(the default).Note
Because a GRUComposition uses the PyTorch GRU module to implement its computations during learning, its
learn
method can only be called with the execution_mode argument set toExecutionMode.PyTorch
(the default).
The GRUComposition uses the PyTorch GRU module
to implement its computations during learning. After learning, the values of the module’s parameters are copied
to the weight matrices
of the corresponding MappingProjections,
and results of computations are copied to the values
of the corresponding Nodes in the GRUComposition at times determined by the value of the synch_node_values_with_torch
option.
Class Reference¶
- class psyneulink.library.compositions.grucomposition.grucomposition.GRUComposition(input_size=None, hidden_size=None, bias=None, enable_learning=True, learning_rate=None, optimizer_params=None, random_state=None, seed=None, name='GRU Composition', **kwargs)¶
Subclass of AutodiffComposition that implements a single-layered gated recurrent network.
See Structure and technical_note under under Execution for a description of when the full Composition is constructed and used for execution vs. when the PyTorch GRU module is used.
- Note: all exposed methods, attributes and
Parameters
) of the GRUComposition are PsyNeuLink elements; all PyTorch-specific elements belong to
pytorch_representation
which, for a GRUComposition, is of classPytorchGRUCompositionWrapper
.
Constructor takes the following arguments in addition to those of AutodiffComposition:
- Parameters
input_size (int : default 1) – specifies the length of the input array to the GRUComposition, and the size of the
input_node
.hidden_size (int : default 1) – specifies the length of the internal state of the GRUComposition, and the size of the
hidden_layer_node
and all nodes other than theinput_node
.bias (bool : default False) – specifies whether the GRUComposition uses bias vectors in its computations.
enable_learning (bool : default True) – specifies whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).
learning_rate (float : default .001) – specifies the learning_rate for the GRUComposition (see Learning Arguments for additional details).
optimizer_params (Dict[str: value]) – specifies parameters for the optimizer used for learning by the GRUComposition (see Learning Arguments for details of specification).
- input_size¶
determines the length of the input array to the GRUComposition and size of the
input_node
.- Type
int
determines the size of the
hidden_layer_node
and all otherINTERNAL
Nodes of the GRUComposition.- Type
int
- bias¶
determines whether the GRUComposition uses bias vectors in its computations.
- Type
bool
- enable_learning¶
determines whether learning is enabled for the GRUComposition (see Learning Arguments for additional details).
- Type
bool
- learning_rate¶
determines the default learning_rate for the parameters of the Pytorch GRU module that are not specified for individual parameters in the optimizer_params argument of the AutodiffComposition’s constructor in the call to its
learn
method (see Learning Arguments for additional details).- Type
float
- w_ih_learning_rate¶
determines the learning rate specifically for the weights of the
efferent projections
from theinput_node
of the GRUComposition:wts_in
,wts_iu
, andwts_ir
; corresponds to theweight_ih_l0
parameter of the PyTorch GRU module (see Learning Arguments for additional details).- Type
flot or bool
- w_hh_learning_rate¶
determines the learning rate specifically for the weights of the
efferent projections
from thehidden_layer_node
of the GRUComposition:wts_hn
,wts_hu
,wts_hr
; corresponds to theweight_hh_l0
parameter of the PyTorch GRU module(see Learning Arguments for additional details).
- Type
float or bool
- b_ih_learning_rate¶
determines the learning rate specifically for the biases influencing the
efferent projections
from theinput_node
of the GRUComposition:bias_ir
,bias_iu
,bias_in
; corresponds to thebias_ih_l0
parameter of the PyTorch GRU module (see `Learning Arguments for additional details).- Type
float or bool
- b_hh_learning_rate¶
determines the learning rate specifically for the biases influencing the
efferent projections
from thehidden_layer_node
of the GRUComposition:bias_hr
,bias_hu
,bias_hn
; corresponds to thebias_hh_l0
parameter of the PyTorch GRU module (see Learning Arguments for additional details).- Type
float or bool
- input_node¶
INPUT
Node that receives the input to the GRUComposition and passes it to thehidden_layer_node
; corresponds to input (i) of the PyTorch GRU module.- Type
- new_node¶
ProcessingMechanism that provides the
hidden_layer_node
with the input from theinput_node
, gated by thereset_node
; corresponds to new gate (n) of the PyTorch GRU module.- Type
ProcessingMechanism that implements the recurrent layer of the GRUComposition; corresponds to hidden layer (h) of the PyTorch GRU module.
- Type
- reset_node¶
Gating Mechanism that that gates the input to the
new_node
; corresponds to reset gate (r) of the PyTorch GRU module.- Type
- update_node¶
Gating Mechanism that gates the inputs to the hidden layer from the
new_node
and the prior state of thehidden_layer_node
itself (i.e., the input it receives from its recurrent Projection); corresponds to update gate (z) of the PyTorch GRU module.- Type
- output_node¶
OUTPUT
Node that receives the output of thehidden_layer_node
; corresponds to result of the PyTorch GRU module.- Type
- learnable_projections¶
list of the MappingProjections in the GRUComposition that have
matrix
parameters that can be learned; these correspond to the learnable parameters of the PyTorch GRU module.- Type
List[MappingProjection]
- wts_in¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from theinput_node
to thenew_node
; corresponds to \(W_{in}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_iu¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from theinput_node
to theupdate_node
; corresponds to \(W_{iz}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_ir¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from theinput_node
to thereset_node
; corresponds to \(W_{ir}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_nh¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from thenew_node
to thehidden_layer_node
. (see Structure for additional information).- Type
- wts_hr¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from thehidden_layer_node
to thereset_node
; corresponds to \(W_{hr}\) term in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_hu¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from thehidden_layer_node
to theupdate_node
; corresponds to \(W_{hz}\) in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_hn¶
MappingProjection with learnable
matrix
(“connection weights”) that projects from thehidden_layer_node
to thenew_node
; corresponds to \(W_{hn}\) in the PyTorch GRU module’s computation (see Structure for additional information).- Type
- wts_hh¶
MappingProjection with fixed
matrix
(“connection weights”) that projects from thehidden_layer_node
to itself (i.e., the recurrent Projection). (see Structure for additional information).- Type
- wts_ho¶
MappingProjection with fixed
matrix
(“connection weights”) that projects from thehidden_layer_node
to theoutput_node
. (see Structure for additional information).- Type
- reset_gate¶
GatingProjection that gates the input to the
new_node
from theinput_node
; itsvalue
is used in the Hadamard product with the input to produce the new (external) input to thehidden_layer_node
. (see Structure for additional information).- Type
- new_gate¶
GatingProjection that gates the input to the
hidden_layer_node
from thenew_node
; itsvalue
is used in the Hadamard product with the (external) input to thehidden_layer_node
from thenew_node
, which determines how much of thehidden_layer_node
's new state is determined by the external input vs. its prior state (see Structure for additional information).- Type
- recurrent_gate¶
GatingProjection that gates the input to the
hidden_layer_node
from its recurrent projection (wts_hh
); itsvalue
is used in the in the Hadamard product with the recurrent input to thehidden_layer_node
, which determines how much of thehidden_layer_node
's new state is determined by its prior state vs.its external input (see Structure for additional information).- Type
- bias_ir_node¶
BIAS
Node, the Projection from which (bias_ir
) provides the the bias to weights (wts_ir
) from theinput_node
to thereset_node
(see Structure for additional information).- Type
- bias_iu_node¶
BIAS
Node, the Projection from which (bias_iu
) provides the the bias to weights (wts_iu
) from theinput_node
to theupdate_node
(see Structure for additional information).- Type
- bias_in_node¶
BIAS
Node, the Projection from which (bias_in
) provides the the bias to weights (wts_in
) from theinput_node
to thenew_node
(see Structure for additional information).- Type
- bias_hr_node¶
BIAS
Node, the Projection from which (bias_hr
) provides the the bias to weights (wts_hr
) from thehidden_layer_node
to thereset_node
(see Structure for additional information).- Type
- bias_hu_node¶
BIAS
Node, the Projection from which (bias_hu
) provides the the bias to weights (wts_hu
) from thehidden_layer_node
to theupdate_node
(see Structure for additional information).- Type
- bias_hn_node¶
BIAS
Node, the Projection from which (bias_hn
) provides the the bias to weights (wts_hn
) from thehidden_layer_node
to thenew_node
(see Structure for additional information).- Type
- biases¶
list of the MappingProjections from the
BIAS
Nodes of the GRUComposition, all of which have `matrix
parameters ifbias
is True; these correspond to the learnable biases of the PyTorch GRU module (see Structure for additional information).- Type
List[MappingProjection]
- bias_ir¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_ir
, from theinput_node
to thereset_node
; corresponds to the \(b_ir\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- bias_iu¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_iu
, from theinput_node
to theupdate_node
; corresponds to the \(b_iz\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- bias_in¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_in
, from theinput_node
to thenew_node
; corresponds to the \(b_in\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- bias_hr¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_hr
, from thehidden_layer_node
to thereset_node
; corresponds to the \(b_hr\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- bias_hu¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_hu
, from thehidden_layer_node
to theupdate_node
; corresponds to the \(b_hz\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- bias_hn¶
MappingProjection with learnable
matrix
(“connection weights”) that provides the bias to the weights,wts_hn
, from thehidden_layer_node
to thenew_node
; corresponds to the \(b_hn\) bias parameter of the PyTorch GRU module (see Structure for additional information).- Type
- class PytorchGRUCompositionWrapper(composition, device, outer_creator=None, dtype=None, subclass_components=None, context=None)¶
Wrapper for GRUComposition as a Pytorch Module Manage the exchange of the Composition’s Projection Matrices and the Pytorch GRU Module’s parameters, and return its output value.
- _flatten_for_pytorch(pnl_proj, sndr_mech, rcvr_mech, nested_port, nested_mech, outer_comp, outer_comp_pytorch_rep, access, context)¶
Return PytorchProjectionWrappers for Projections to/from GRUComposition to nested Composition Replace GRUComposition’s nodes with gru_mech and projections to and from it.
- Return type
Tuple
- _instantiate_GRU_pytorch_mechanism_wrappers(gru_comp, device, context)¶
Instantiate PytorchMechanismWrapper for GRU Node
- _instantiate_GRU_pytorch_projection_wrappers(torch_gru, device, context)¶
Create PytorchGRUProjectionWrappers for each learnable Projection of GRUComposition For each PytorchGRUProjectionWrapper, assign the current weight matrix of the PNL Projection to the corresponding part of the tensor in the parameter of the Pytorch GRU module.
- forward(inputs, optimization_num, synch_with_pnl_options, context=None)¶
Forward method of the model for PyTorch modes
This is called only when GRUComposition is run as a standalone Composition. Otherwise, the node.execute() method is called directly (i.e., it is treated as a single node). Returns a dictionary {output_node:value} with the output value for the torch GRU module (that is used by the collect_afferents method(s) of the other node(s) that receive Projections from the GRUComposition.
- Return type
dict
- get_parameters_from_torch_gru()¶
Get parameters from PyTorch GRU module corresponding to GRUComposition’s Projections. Format tensors:
transpose all weight and bias tensors;
reformat biases as 2d
- Return formatted tensors, which are used:
in set_weights_from_torch_gru(), where they are converted to numpy arrays
for forward computation in pytorchGRUwrappers._copy_pytorch_node_outputs_to_pnl_values()
- Return type
Tuple
[Tensor
]
- class PytorchGRUMechanismWrapper(mechanism, composition, component_idx, use, dtype, device, context)¶
Wrapper for Pytorch GRU Node Handling of hidden_state: uses GRUComposition’s HIDDEN_NODE.value to cache state of hidden layer: - gets input to function for hidden state from GRUComposition’s HIDDEN_NODE.value - sets GRUComposition’s HIDDEN_NODE.value to return value for hidden state
- _calculate_torch_gru_internal_state_values(input, hidden_state)¶
Manually calculate and store internal state values for torch GRU prior to backward pass These are needed for assigning to the corresponding nodes in the GRUComposition. Returns r_t, z_t, n_t, h_t current reset, update, new, hidden and state values, respectively
- Return type
dict
- collect_afferents(batch_size, port=None, inputs=None)¶
Return afferent projections for input_port(s) of the Mechanism If there is only one input_port, return the sum of its afferents (for those in Composition) If there are multiple input_ports, return a tensor (or list of tensors if input ports are ragged) of shape:
(batch, input_port, projection, …)
Where the ellipsis represent 1 or more dimensions for the values of the projected afferent.
- Return type
Tensor
- set_pnl_variable_and_values(set_variable=False, set_value=True, context=None)¶
Set the state of the PytorchMechanismWrapper’s Mechanism Note: if execute_mech=True requires that variable=True
- pytorch_composition_wrapper_type¶
alias of
psyneulink.library.compositions.grucomposition.pytorchGRUwrappers.PytorchGRUCompositionWrapper
- pytorch_mechanism_wrapper_type¶
alias of
psyneulink.library.compositions.grucomposition.pytorchGRUwrappers.PytorchGRUMechanismWrapper
- _construct_pnl_composition(input_size, hidden_size, context)¶
Construct Nodes and Projections for GRUComposition
- _set_learning_attributes()¶
Set learning-related attributes for Node and Projections
- set_weights(weights, biases, context=None)¶
Set weights for Projections to input_node and hidden_layer_node.
- infer_backpropagation_learning_pathways(execution_mode, context=None)¶
Create backpropagation learning pathways for every Input Node –> Output Node pathway Flattens nested compositions:
only includes the Projections in outer Composition to/from the CIMs of the nested Composition (i.e., to input_CIMs and from output_CIMs) – the ones that should be learned;
excludes Projections from/to CIMs in the nested Composition (from input_CIMs and to output_CIMs), as those should remain identity Projections;
see
PytorchCompositionWrapper
for table of how Projections are handled and further details.Returns list of target nodes for each pathway
- Return type
list
- _get_pytorch_backprop_pathway(input_node, context)¶
Breadth-first search from input_node to find all input -> output pathways Uses queue(node, composition) to traverse all nodes in the graph IMPLEMENTATION NOTE: flattens nested Compositions, removing any CIMs in the nested Compositions Return a list of all pathways from input_node -> output node
- Return type
list
- _get_execution_mode(execution_mode)¶
Parse execution_mode argument and return a valid execution mode for the learn() method
- _add_dependency(sender, projection, receiver, dependency_dict, queue, comp)¶
Override to implement direct pathway through gru_mech for pytorch backprop pathway.
- _identify_target_nodes(context)¶
Recursively call all nested AutodiffCompositions to assign TARGET nodes for learning
- add_node(node, required_roles=None, context=None)¶
Override if called from command line to disallow modification of GRUComposition
- add_projection(*args, **kwargs)¶
Override if called from command line to disallow modification of GRUComposition
- Note: all exposed methods, attributes and
- exception psyneulink.library.compositions.grucomposition.grucomposition.GRUCompositionError(message, component=None)¶