Critical multi-paradigm networks are a family of deep learning architectures characterized by pseudo-linear activation landscapes, critical activation phenomena, and particular unsupervised update rules.

Right the activation function in two lines with an intermediate z variable which is in negative 1 positive 1. This will make the regularization formulas simpler to describe

Add sparse activation penalty

Add mean activation penalty (4?)

Explicitly list self-/supervised objective weight gradient update

Move all regularizes to one big list

Distinctly write the variance invariance covariance losses.

Describe the initialization for weights and biases.

Initialize the weights by some Bernoulli distribution for the sign. Make about 20% negative.

Dichotomize progressive vs. self organizing implicit objectives

Explain that gradients update rules may not be active on every interaction iteration.

Clarify the distinction between iteration step and time step in the introduction paragraph. This means that on every time step a policy takes in tensor by 10:30... And produces. Each policy update is an interaction step, while each T in tensor is a time step.

Separate topic: sparse activations.

In the sparse neural networks, tanh is replaced with a threshold function. And the presynaptic sum builds up over time before activating + or +. Temporal invariance regularization is applied on the input summand not the activation. Maybe replace the Variance regularization with a dynamical system objective like synchrony along the representation axes

Networks

Feedforeward drop-in layer replacement