explet.png

A lot of the codes are best represented by graphs where some of the nodes on the graph are controllable and some are not.

Make sure

To do

The neural networks should employ sparse activation. Maybe even the nodes are sparsely activated

Vision

zu clip embedding

zc eye movements

Combined self and other external dialogue, inner voice, Internet query responses, and text prompt in the text modality:

zc next token

zu T5, blender, gpt-2, longformer

I will have to pre-train the latent translation functions from text to audio.

Audio: