Critical multi-paradigm networks are a family of deep learning architectures characterized by pseudo-linear activation landscapes, critical activation phenomena, and particular unsupervised update rules.
Right the activation function in two lines with an intermediate z variable which is in negative 1 positive 1. This will make the regularization formulas simpler to describe
Add sparse activation penalty
Add mean activation penalty (4?)
Explicitly list self-/supervised objective weight gradient update
Move all regularizes to one big list
Distinctly write the variance invariance covariance losses.
Describe the initialization for weights and biases.
Initialize the weights by some Bernoulli distribution for the sign. Make about 20% negative.
Dichotomize progressive vs. self organizing implicit objectives
Explain that gradients update rules may not be active on every interaction iteration.
Clarify the distinction between iteration step and time step in the introduction paragraph. This means that on every time step a policy takes in tensor by 10:30... And produces. Each policy update is an interaction step, while each T in tensor is a time step.
Separate topic: sparse activations.
In the sparse neural networks, tanh is replaced with a threshold function. And the presynaptic sum builds up over time before activating + or +. Temporal invariance regularization is applied on the input summand not the activation. Maybe replace the Variance regularization with a dynamical system objective like synchrony along the representation axes
Networks
Feedforeward drop-in layer replacement