AutodiffComposition

Overview

AutodiffComposition is a subclass of Composition that trains models more quickly by integrating with PyTorch, a popular machine learning library. In situations with training, AutodiffComposition is used similarly to a Composition, but is much faster.

The xor_in_psyneulink_and_pytorch.py script (in the Scripts folder of the PsyNeuLink source code) is an example of how to use AutodiffComposition. The script also gives a comparison of runtimes.

Creating an AutodiffComposition

An AutodiffComposition can be created by calling the constructor, and then adding Components using the add methods of its parent class Composition. The most unusual argument in initialization is param_init_from_pnl, which controls how parameters are set up for the internal PyTorch representation of the model.

If set to True:

  • Only weight parameters that correspond to projections are created. No trainable bias parameters are created, as they
    don’t exist for the autodiff composition’s mechanisms.
  • The weight parameters are initialized to be perfectly identical to the autodiff composition’s projections - the
    tensor of the parameter object corresponding to a particular projection not only has the same dimensionality as the projection’s matrix, it has the same exact values.
  • Pytorch functions representing mechanism functions incorporate their scalar, untrainable biases.

If set to False:

  • Both weight parameters corresponding to projections and trainable bias parameters for mechanisms are created.
  • Weight parameters have the same dimensionality as their corresponding projections. However, their values - and those
    of the bias parameters - are sampled from a random distribution.
  • Though trainable biases now exist, Pytorch functions representing mechanism functions still incorporate their scalar,
    untrainable biases.

Warning

Do not add or remove Mechanisms or Projections to an AutodiffComposition after it has been run for the first time. Unlike an ordinary Composition, AutodiffComposition does not support this functionality.

Two other initialization arguments are patience and min_delta, allow the model to halt training early. The model tracks how many consecutive ‘bad’ epochs of training have failed to significantly reduce the model’s loss. Once this number exceeds patience, the model stops training. By default, patience is None, and the model will train for the number of specified epochs and will not stop training early.

min_delta defines what threshold counts as a significant reduction in model loss. By default it is zero, in which case any reduction in loss counts as a significant reduction. If min_delta is large and positive, the model tends to stop earlier because it views fewer epochs as ‘good’.

learning_rate specifies the learning rate for this run (default 0.001), which is passed to the optimizer argument. optimizer specifies the kind of optimizer used in training. The current options are ‘sgd’ (the default) or ‘adam’.

learning_enabled specifies whether the AutodiffComposition should learn, and it defaults to True. When True, the AutodiffComposition trains using PyTorch, as normal. When False, the AutodiffComposition acts like an ordinary Composition, which does not change weights. learning_enabled is also an attribute, which can be toggled between runs.

optimizer_type specifies the kind of optimizer used in training. The current options are ‘sgd’ (which is the default) or ‘adam’.

weight_decay specifies the L2 penalty (which discourages large weights) used by the optimizer. This defaults to 0.

loss_spec specifies the loss function for training. It can be a string or a PyTorch loss function. The current options for strings are ‘mse’ (the default), ‘crossentropy’, ‘l1’, ‘nll’, ‘poissonnll’, and ‘kldiv’. These refer to Mean Squared Error, Cross Entropy, L1 loss, Negative Log Likelihood loss, Poisson Negative Log Likelihood, and KL Divergence respectively. The loss_spec can also be any PyTorch loss function, including a custom-written one. For a list of PyTorch loss functions, see https://pytorch.org/docs/stable/nn.html#loss-functions. For information on writing a custom loss function, see https://pytorch.org/docs/master/notes/extending.html and https://discuss.pytorch.org/t/build-your-own-loss-function-in-pytorch/235

randomize specifies whether the order of inputs will be randomized in each epoch. (In each epoch, all inputs are run, but if randomize is True then the order in which inputs are within an epoch is random.)

refresh_losses specifies whether the losses attribute is refreshed for each call to run(). If False, the losses of each run are appended to the losses attribute. If True, the losses of each run overwrite losses instead.

force_no_retain_graph defaults to False. If True, the AutodiffComposition does not use the retain_graph option when computing PyTorch gradient. This can reduce memory usage. However, it breaks recurrent networks, so it should only be used when the network is not recurrent.

Note

The AutodiffComposition detachs all gradients between epochs of training. For more information on why this is done, see here or here.

Structure

AutodiffComposition has all the attributes of its parent class Composition, in addition to several more.

The target_CIM attribute is analogous to the input_CIM of any Composition, but instead of providing inputs, provides targets for the AutodiffComposition.

The pytorch_representation attribute holds the PyTorch representation of the PsyNeuLink model that AutodiffComposition contains.

The losses attribute tracks the average loss for each training epoch.

As mentioned above, the learning_enabled attribute can be toggled to determine whether the AutodiffComposition learns or whether it executes like an ordinary Composition.

The optimizer attribute contains the PyTorch optimizer function used for learning. It is determined at initialization by the optimizer_type, learning_rate, and weight_decay arguments.

The loss attribute contains the PyTorch loss function used for learning. It is determined at initialization by the loss_spec argument.

Execution

Most arguments to AutodiffComposition’s run or execute methods are the same as in a Composition. When learning_enabled is False, the arguments are the same, since in this case the AutodiffComposition executes like a Composition.

However, if learning_enabled is True, the inputs argument format is different. If learning_enabled is True, then inputs should be a dictionary with required keys “inputs” and “targets”, and optional key “epochs”. The value at “inputs” should be a dictionary relating origin mechanisms to their inputs. The value at “targets” should be a dictionary relating terminal mechanisms to their inputs. The value at “epochs” is an integer stating the number of epochs of training (i.e. how many times all inputs and targets are run). It defaults to 1. Here is an example of creating a simple AutodiffComposition and specifying inputs and targets:

>>> import psyneulink as pnl
>>> # set up PsyNeuLink Components
>>> my_mech_1 = pnl.TransferMechanism(function=pnl.Linear, size = 3)
>>> my_mech_2 = pnl.TransferMechanism(function=pnl.Linear, size = 2)
>>> my_projection = pnl.MappingProjection(matrix=np.random.randn(3,2),
...                     sender=my_mech_1,
...                     receiver=my_mech_2)
>>> # create AutodiffComposition
>>> my_autodiff = pnl.AutodiffComposition()
>>> my_autodiff.add_node(my_mech_1)
>>> my_autodiff.add_node(my_mech_1)
>>> my_autodiff.add_projection(sender=my_mech_1, projection=my_projection, receiver=my_mech_2)
>>> # input specification
>>> my_inputs = {my_mech_1: [[1, 2, 3]]}
>>> my_targets = {my_mech_2: [[4, 5]]}
>>> input_dict = {"inputs": my_inputs, "targets": my_targets, "epochs": 2}
>>> my_autodiff.run(inputs = input_dict)

Logging

Logging currently works differently in AutodiffComposition than in Composition. In an AutodiffComposition, no logging is done by default, because logging substantially (roughly by 30%) slows down AutodiffComposition. If you wish for all projection weights and mechanism values to be logged during execution or training of AutodiffComposition, you must set the do_logging argument of the run() method to True. Logging with AutodiffComposition is slightly hacked together, so the time and context in the log are not meaningful, only the logged value is meaningful.

Nested Execution

In general, an AutodiffComposition may be nested inside another Composition, like ordinary Composition nesting. However, there are a few differences. The input format of an AutodiffComposition with learning enabled is quite unusual. Thus, when learning is enabled, the AutodiffComposition must be an origin mechanism of the Composition.

Note

Like with all nested Compositions, you must call an AutodiffComposition’s _analyze_graph() method (or execute the AutodiffComposition) before nesting it.

However, when learning is not enabled, AutodiffComposition works just like an ordinary Composition, in theory. Thus, an AutodiffComposition with learning not enabled receives input in the same format as an ordinary Composition, and can therefore be placed anywhere in a Composition.

Note

Using an AutodiffComposition not as an origin mechanism is currently buggy, and might produce unexpected results.

Below is an example script showing how to nest an AutodiffComposition with learning enabled.

>>> import psyneulink as pnl
>>> # set up PsyNeuLink Components
>>> my_mech_1 = pnl.TransferMechanism(function=pnl.Linear, size = 3)
>>> my_mech_2 = pnl.TransferMechanism(function=pnl.Linear, size = 2)
>>> my_projection = pnl.MappingProjection(matrix=np.random.randn(3,2),
...                     sender=my_mech_1,
...                     receiver=my_mech_2)
>>> # create AutodiffComposition
>>> my_autodiff = pnl.AutodiffComposition()
>>> my_autodiff.add_node(my_mech_1)
>>> my_autodiff.add_node(my_mech_1)
>>> my_autodiff.add_projection(sender=my_mech_1, projection=my_projection, receiver=my_mech_2)
>>> my_autodiff._analyze_graph()  # alternatively, my_autodiff.run( ... )
>>>
>>> # input specification
>>> my_inputs = {my_mech_1: [[1, 2, 3]]}
>>> my_targets = {my_mech_2: [[4, 5]]}
>>> input_dict = {"inputs": my_inputs, "targets": my_targets, "epochs": 2}
>>>
>>> parentComposition = pnl.Composition()
>>> parentComposition.add_node(my_autodiff)
>>>
>>> training_input = {my_autodiff: input_dict}
>>> result1 = parentComposition.run(inputs=input)
>>>
>>> my_autodiff.learning_enabled = False
>>> no_training_input = {my_autodiff: my_inputs}
>>> result2 = parentComposition.run(inputs=no_training_input)

Class Reference