# AutodiffComposition¶

## Overview¶

AutodiffComposition is a subclass of Composition used to train feedforward neural network models through integration with PyTorch, a popular machine learning library, which executes considerably more quickly than using the standard implementation of learning in a Composition, using its learning methods. An AutodiffComposition is configured and run similarly to a standard Composition, with some exceptions that are described below. An example is provided in the xor_in_psyneulink_and_pytorch.py script (in the Scripts folder of the PsyNeuLink source code), which also provides a comparison of runtimes.

## Creating an AutodiffComposition¶

An AutodiffComposition can be created by calling its constructor, and then adding Components using the standard Composition methods for doing so. The constructor also includes an number of arguments that are specific to the AutodiffComposition, as described below.

Warning

Mechanisms or Projections should not be added to or deleted from an AutodiffComposition after it has been run for the first time. Unlike an ordinary Composition, AutodiffComposition does not support this functionality.

• param_init_from_pnl argument – determines how parameters are set up for the internal PyTorch representation of the model. If it is set to True:

• only weight parameters that correspond to the value of the matrix parameter of the MappingProjections in the Composition are created; no bias parameters are created, as the bias parameters associated with Mechanisms are not trainable;
• the weight parameters are initialized to be identical to the value of matrix parameters of the MappingProjections in the Composition; the tensor of the parameter object corresponding to a particular MappingProjection not only has the same dimensionality as its matrix, it also has the exact same values;
• Pytorch functions representing the function of each Mechanisms in the Composition incorporate their scalar, untrainable biases.

If it is set to False:

• in addition to the weight parameters created for each MappingProjection, a trainable bias parameter is created for each for each Mechanism in the Composition;
• weight parameters have the same dimensionality as the matrix parameter of the corresponding MappingProjections; however, their values – and those of the bias parameters – are sampled from a random distribution;
• in addition to the trainable biases created for each Mechanism, the Pytorch function implemented for each Mechanism’s function still incorporates its scalar, untrainable bias.
• patience – allows the model to halt training early. The model tracks how many consecutive ‘bad’ epochs of training have failed to significantly reduce the model’s loss. When this number exceeds patience, the model stops training. By default, patience is None, and the model will train for the number of specified epochs and will not stop training early.

• min_delta – specifies the threshold used by patience used to determine a significant reduction in model loss. By default it is zero, in which case any reduction in loss counts as a significant reduction. If min_delta is large and positive, the model tends to stop earlier because it views fewer epochs as ‘good’.

• learning_rate – specifies the learning rate for the current run (default 0.001), which is passed to the optimized specified in the optimizer argument.

• optimizer – specifies the kind of optimizer used in training. The current options are ‘sgd’ (the default) or ‘adam’.

• optimizer_type – specifies the kind of optimizer used in training. The current options are ‘sgd’ (which is the default) or ‘adam’.

• learning_enabled – specifies whether the AutodiffComposition should learn (default is True). When True, the AutodiffComposition trains using PyTorch, as normal. When False, the AutodiffComposition run like an ordinary Composition, which does not change weights. learning_enabled is also an attribute, which can be toggled between runs.

• weight_decay – specifies the L2 penalty (which discourages large weights) used by the optimizer. This defaults to 0.

• loss_spec – specifies the loss function for training. It can be a string or a PyTorch loss function. The current options for strings are ‘mse’ (the default), ‘crossentropy’, ‘l1’, ‘nll’, ‘poissonnll’, and ‘kldiv’. These refer to Mean Squared Error, Cross Entropy, L1 loss, Negative Log Likelihood loss, Poisson Negative Log Likelihood, and KL Divergence respectively. The loss_spec can also be any PyTorch loss function, including a custom-written one. For a list of PyTorch loss functions, see Loss function. For information on writing a custom loss function, see Extending PyTorch, as well as Build your own loss function in PyTorch.

• randomize – specifies whether the order of inputs will be randomized in each epoch. All inputs are run in each epoch. However, if randomize is True, then the order in which inputs are within an epoch is random.

• refresh_losses – specifies whether the losses attribute is refreshed for each call to run. If False, the losses of each run are appended to the losses attribute. If True, the losses of each run overwrite losses instead.

• force_no_retain_graph – False by default. If True, the AutodiffComposition does not use PyTorch’s retain_graph option when computing the gradient. This can reduce memory usage; however, it breaks recurrent networks, so it should only be used when the network is not recurrent.

Note

The AutodiffComposition detachs all gradients between epochs of training. For more information on why this is done, see Trying to backward through a graph a second time and Why we need to detach Variable which contains [a] hidden representation.

## Structure¶

An AutodiffComposition has all the attributes of its parent class Composition, as well ones corresponding to the arguments described above, and the following.

## Execution¶

Most arguments to AutodiffComposition’s run or execute methods are the same as for a Composition. When learning_enabled is False, the arguments are the same, since in this case the AutodiffComposition executes like a Composition.

However, if learning_enabled is True, the inputs argument format is different. If learning_enabled is True, then inputs argument must be a dictionary with at least two nested dictionaries within it, one for the inputs and the other for the targets, as well as an additional entry specifying the number of training epochs to run. Specifically, the outer dictionary must have at least two entries with keys “inputs” and “targets”. The value of the “inputs” entry must be a standard input dictionary, specifying the inputs for each ORIGIN Mechanism. The value of the “targets” entry must be a similar dictionary, in this case specifying the target values for the outputs of each TERMINAL Mechanism in the Composition. In addition, an entry with the key “epochs” can be included, which must then have as its value an integer specifying the number of epochs of training to run (i.e. how many times all inputs and corresponding targets are run); it defaults to 1. The following is an example showing how to create a simple AutodiffComposition, specify its inputs and targets, and run it with learning enabled and disabled.

>>> import psyneulink as pnl
>>> # Set up PsyNeuLink Components
>>> my_mech_1 = pnl.TransferMechanism(function=pnl.Linear, size = 3)
>>> my_mech_2 = pnl.TransferMechanism(function=pnl.Linear, size = 2)
>>> my_projection = pnl.MappingProjection(matrix=np.random.randn(3,2),
...                     sender=my_mech_1,
>>> # Create AutodiffComposition
>>> my_autodiff = pnl.AutodiffComposition()
>>> # Specify inputs and targets
>>> my_inputs = {my_mech_1: [[1, 2, 3]]}
>>> my_targets = {my_mech_2: [[4, 5]]}
>>> input_dict = {"inputs": my_inputs, "targets": my_targets, "epochs": 2}
>>> # Run Composition with learning enabled
>>> my_autodiff.learning_enabled=True # this is not strictly necessary, as learning_enabled is True by default
>>> my_autodiff.run(inputs = input_dict)
>>> # Run Composition with learning disabled
>>> my_autodiff.learning_enabled=False
>>> my_autodiff.run(inputs = input_dict)


As shown above (and for convenience), an AutodiffComposition with learning disabled can be run with the same input format used for training. In that case, the “input” entry is used as the inputs for the run, and the “targets” and “epochs” entries (if present) are ignored. However, since an AutodiffComposition with learning disabled is treated like any other Composition, it can also be run with the same input format as a standard Composition; that is, a single dictionary specifying the inputs for each ORIGIN Mechanism), such the one defined in the exaple above, as follows:

>>> my_autodiff.run(inputs = my_inputs)


## Logging¶

Logging currently works differently in AutodiffComposition than in Composition. In an AutodiffComposition, no logging is done by default, because logging substantially (roughly by 30%) slows down AutodiffComposition. If you wish for all projection weights and mechanism values to be logged during execution or training of AutodiffComposition, you must set the do_logging argument of the run() method to True. Logging with AutodiffComposition is slightly hacked together, so the time and context in the log are not meaningful, only the logged value is meaningful.

## Nested Execution¶

Like any other Composition, an AutodiffComposition may be nested inside another. If learning is not enabled, nesting is handled in the same way as any other Composition. However, if learning is enabled for a nested AutodiffComposition, its input format is different (see below); as a consequence, a nested AutodiffComposition with learning enabled must an ORIGIN Mchanism of the Composition in which it is nested.

Note

As with all nested Compositions, the AutodiffComposition’s _analyze_graph method must be called (or the AutodiffComposition must be run) before nesting it.

The following shows how the AutodiffComposition created in the previous example can be nested and run inside another Composition:

>>> my_autodiff._analyze_graph()  # alternatively, my_autodiff.run( ... )
>>>
>>> # Create outer composition
>>> my_outer_composition = pnl.Composition()
>>> # Specify dict containing inputs and targets for nested Composition
>>> training_input = {my_autodiff: input_dict}
>>> # Run with learning enabled
>>> result1 = my_outer_composition.run(inputs=training_input)


## Class Reference¶

class psyneulink.library.compositions.autodiffcomposition.AutodiffComposition(param_init_from_pnl=True, patience=None, min_delta=0, learning_rate=0.001, learning_enabled=True, optimizer_type=None, loss_spec=None, randomize=False, refresh_losses=False, name="autodiff_composition")

Subclass of Composition that trains models more quickly by integrating with PyTorch.

Parameters: param_init_from_pnl (boolean : default True) – a Boolean specifying how parameters are initialized. (See Creating an AutodiffComposition for details) patience (int or None : default None) – patience allows the model to stop training early, if training stops reducing loss. The model tracks how many consecutive epochs of training have failed to reduce the model’s loss. When this number exceeds patience, the model stops training early. If patience is None, the model will train for the number of specified epochs and will not stop training early. min_delta (float : default 0) – the minimum reduction in average loss that an epoch must provide in order to qualify as a ‘good’ epoch. Used for early stopping of training, in combination with patience. learning_rate (float : default 0.001) – the learning rate, which is passed to the optimizer. learning_enabled (boolean : default True) – specifies whether the AutodiffComposition should learn. When True, the AutodiffComposition trains using PyTorch. When False, the AutodiffComposition executes just like an ordinary Composition optimizer_type (str : default 'sgd') – the kind of optimizer used in training. The current options are ‘sgd’ or ‘adam’. weight_decay (float : default 0) – specifies the L2 penalty (which discourages large weights) used by the optimizer. loss_spec (str or PyTorch loss function : default 'mse') – specifies the loss function for training. The current string options are ‘mse’ (the default), ‘crossentropy’, ‘l1’, ‘nll’, ‘poissonnll’, and ‘kldiv’. Any PyTorch loss function can work here, such as ones from https://pytorch.org/docs/stable/nn.html#loss-functions randomize (boolean : default False) – specifies whether the order of inputs will be randomized in each epoch. (In each epoch, all inputs are run, but if randomize is True then the order of inputs within an epoch is random.) refresh_losses (boolean: default False) – specifies whether the losses attribute is refreshed for each call to run(). If False, the losses of each run are appended to the losses attribute. If True, the losses of each run overwrite losses instead.
pytorch_representation

PytorchModelCreator – the PyTorch representation of the PsyNeuLink model

losses

list of floats – tracks the average loss for each training epoch

patience

int or None : default None – allows the model to stop training early, if training stops reducing loss. The model tracks how many consecutive epochs of training have failed to reduce the model’s loss. When this number exceeds patience, the model stops training early. If patience is None, the model will train for the number of specified epochs and will not stop training early.

min_delta

float : default 0 – the minimum reduction in average loss that an epoch must provide in order to qualify as a ‘good’ epoch. Used for early stopping of training, in combination with patience.

learning_enabled

boolean : default True – specifies whether the AutodiffComposition should learn. When True, the AutodiffComposition trains using PyTorch. When False, the AutodiffComposition executes just like an ordinary Composition. This attribute can be toggled.

learning_rate

float: default 0.001 – the learning rate for training. Currently only used to initialize the optimizer attribute.

optimizer

PyTorch optimizer function – the optimizer used for training. Depends on the optimizer_type, learning_rate, and weight_decay arguments from initialization.

loss

PyTorch loss function – the loss function used for training. Depends on the loss_spec argument from initialization.

name

str : default AutodiffComposition-<index> – the name of the Composition. Specified in the name argument of the constructor for the Projection; if not specified, a default is assigned by CompositionRegistry (see Registry for conventions used in naming, including for default and duplicate names).

Returns: instance of AutodiffComposition AutodiffComposition
class Parameters(owner, parent=None)
learning_rate
Default value: 0.001 float
losses
Default value: None
min_delta
Default value: 0 int
optimizer
Default value: None
patience
Default value: None
pytorch_representation
Default value: None
run(inputs=None, do_logging=False, scheduler=None, termination_processing=None, context=None, num_trials=None, minibatch_size=1, call_before_time_step=None, call_after_time_step=None, call_before_pass=None, call_after_pass=None, call_before_trial=None, call_after_trial=None, call_before_minibatch=None, call_after_minibatch=None, clamp_input='soft_clamp', bin_execute=False, initial_values=None, reinitialize_values=None, runtime_params=None)

Passes inputs to AutodiffComposition, then execute sets of nodes that are eligible to run until termination conditions are met.

inputs: {‘inputs’: {Mechanism: list}, ‘targets’: {Mechanism: list}, ‘epochs’: int }
a key-value pair with the keys “inputs”, “targets”, and “epochs”. The value corresponding to the “inputs” key should itself be a key-value pair for each Node in the composition that receives inputs from the user. For each pair, the key is the Node and the value is a list of inputs. Each input in the list corresponds to one TRIAL. Analogously, the value corresponding with ‘targets’ should be a key-value pair with keys for each terminal Node in the composition and a corresponding list of the Node’s target values for each trial. The value corresponding to the ‘epochs’ key is an int specifying how many times the Composition should run through the entire input set.
scheduler: Scheduler
the scheduler object that owns the conditions that will instruct the execution of the Composition. If not specified, the Composition will use its automatically generated scheduler.
context
context will be set to self.default_execution_id if unspecified
base_context
the context corresponding to the execution context from which this execution will be initialized, if values currently do not exist for context
num_trials: int
typically, the composition will infer the number of trials from the length of its input specification. To reuse the same inputs across many trials, you may specify an input dictionary with lists of length 1, or use default inputs, and select a number of trials with num_trials.
minibatch_size: int or TRAINING_SET
if learning is enabled, the number of trials to be executed by the autodiff composition between weight updates. if set to TRAINING_SET, weights will be updated after each full traversal of the provided inputs (i.e. after each epoch).
call_before_time_step: callable
Not currently implemented for autodiff compositions.
call_after_time_step: callable
Not currently implemented for autodiff compositions.
call_before_pass: callable
Not currently implemented for autodiff compositions.
call_after_pass: callable
Not currently implemented for autodiff compositions.
call_before_trial: callable
Not currently implemented for autodiff compositions.
call_after_trial: callable
Not currently implemented for autodiff compositions.
call_before_minibatch: callable
will be called before each minibatch is executed.
call_after_minibatch: callable
will be called after each minibatch is executed.
initial_values: Dict[Node: Node Value]
sets the values of nodes before the start of the run. This is useful in cases where a node’s value is used before that node executes for the first time (usually due to recurrence or control).
runtime_params: Dict[Node: Dict[Parameter: Tuple(Value, Condition)]]

nested dictionary of (value, Condition) tuples for parameters of Nodes (Mechanisms or Compositions of the Composition; specifies alternate parameter values to be used only during this Run when the specified Condition is met.

Outer dictionary:
• key - Node
• value - Runtime Parameter Specification Dictionary
Runtime Parameter Specification Dictionary:
• key - keyword corresponding to a parameter of the Node
• value - tuple in which the index 0 item is the runtime parameter value, and the index 1 item is a Condition

See Runtime Parameters for more details and examples of valid dictionaries.

log: bool, LogCondition

Sets the log_condition for every primary node and projection in this Composition, if it is not already set.

Note

as when setting the log_condition directly, a value of True will correspond to the EXECUTION LogCondition.

Returns: output value of the final Node executed in the composition various