Full Stack Artificial Intelligence_ The Node Neural Network, AExperience, VNCEnv, and Computatrum

The general recipe to develop increasingly advanced machine learning (ML) systems is: identify and optimize various ‘components of intelligence' (datasets, environments, training paradigms, objectives, architectures) under an incremental iterative dev-test cycle while minimizing the amount of information movement required (e.g.: data collection, training, architecture design and experimentation). However, even at the frequency of an expert research team, there are too many known and yet-to-be explored ML application domains, and not enough automation driving this evolutionary optimization process from end-to-end. I present a few iterations on this full-stack optimization loop by 1) introducing a novel multi-paradigm, multimodal deep learning architecture -- the Node Neural Network -- 2) describing a heterogeneous dataset / environment curriculum -- AExperience -- which 3) includes a computer interaction training environment -- VNCEnv -- and 4) proposing a data-hunting, self-/unsupervised, neurosymbolic, social and developmental learning, Internet-connected artificial intelligence system Computatrum. This paper provides motivation and relevant background, introduces the above contributions, and invites the reader to utilize and contribute to its open source work. Code: https://github.com/JacobFV/full-stack-ai.

The brain continually searches for more adaptive structure and activity patterns. Encoded in neuronal and network scale mechanisms, this implicit objective motivates forming minimally complex yet accurate representations, functionally specialized regions, mastery over the local environment, and extending over larger optimization horizons, this objective motivates the formal research and development of artificial “intelligence” (AI). The realization of sufficiently advanced AI -- especially that vaguely referred to by phrases “human-level artificial intelligence”, “artificial general intelligence”, and “superintelligence” -- has strong and obvious motivation from many fields of human endeavour such as healthcare, social work, business, economics, governance, and, of course, the STEM disciplines. Still, there are no silver bullets nor any free lunch, and the gradual evolution of AI has been artificially rate-limited in many respects. Ever since its inception, there has not been a general consensus on the/a formal measure of intelligence, goal for autonomous agent action, or unifying framework to guide AI research, and even today, state-of-the-art deep neural networks are relatively hardwired considering that they must be spoon fed datasets, situated in closed environments, and train under an economic fitness landscape that they have no direct awareness of. Approaching and surpassing the rate-limiting bar of human research and development demands liberating as many aspects of the ML development cycle as possible to autonomous control. Given this immense problem domain, I ask:

Can a fully autonomous, open-ended artificial intelligence system research and develop state-of-the-art ML systems -- including improvements of itself -- subject to the same technological and financial constraints as an independent researcher?

This is not just asking for autoML, unsupervised learning, intrinsically-motivated reinforcement learning, or an AI-generating algorithm. I propose developing a system that genuinely propagates feedback ‘end-to-end’, pursues its own cultivated intrinsic motivations, is its own economic entity, and uses standard peripherals connected to a Ubuntu VM with Internet access to interact with the real, open-ended Environment including robots, research sites, and its own software and compute resources. I call this system Computatrum, and present the following contributions:

I describe a novel multi-paradigm, multimodal deep learning architecture -- the Node Neural Network (NNN) --- (Section 3) and evaluate it on a diverse suite of single and multimodal supervised, unsupervised, and transfer learning benchmarks and single and multiagent environments with ablation studies. (Section 4)
I develop AExperience: a diverse suite of datasets and environments representing a broad range of data distributions and modalities. (Section 5) These include various competitive, cooperative, and mixed-mode multiagent environments; lifelong learning, transfer learning, multiview learning, domain adaptation, and domain generalization tests; and a virtual machine interaction environment VNCEnv. (Appendix B,C)
I propose future integration of the above components towards developing Computatrum, and provide a detailed plan of action to realize this goal. I also consider the larger theoretical, technical, business, and ethical scope of this work and discuss AI safety in all domains. (Section 6)

2. Background

While this work aims to capture many principles from the brain in a machine learning system, it would be very misleading to imply that it describes a system that is ‘like the brain’. Over the ages, engineers have compared the brain to the clock, the steam engine, the computer, and recently, the neural network, yet as with physics’ study of the universe, neuroscience research regularly emphasizes that the Brain is something else, not captured by existing models. With this limited understanding, engineering even part of its intelligence into an artificial system demands entertaining not just one or two, but as many overlapping, disjoint, and even conflicting models / perspectives / understandings / principles on the brain’s mechanisms which produce human-level intelligence as reasonably possible. I introduce some that are relevant to the Node Neural Network architecture below:

The brain\footnote{Unless otherwise qualified, I refer to the human brain by “brain”.} itself operates on limited heterogenous representations of its external world. Each sensory nerve provides only a window of information, and the regions they innervate likewise specialize to individually represent and express diverse activity patterns. Along the cortex, millions of similar neuronal substructures -- sometimes termed cortical columns -- have been interpreted as forming world models of their individual receptive field space [research from HTM theory], and [The Brain From Inside Out; “Neural syntax: cell assemblies, synapsembles and readers”] even emphasizes the individual predictive modeling of the brain’s X billion neurons and interprets their action potential as making a hypothesis test on the neuron’s external state. The brain possesses many other anatomical and physiological features which are taken by Predictive Coding Theory, the Free Energy Principle, the Energy Homeostasis Principle, and Buzsaki’s inside-out perspective to collectively represent a statistical ensemble over the external state.

Probabilistic hypothesis testing and predictive coding over millions and billions of computational units deviates significantly from the stateless deterministic paradigm of functional programming, yet it can bring fundamental algorithmic complexity reduction. In theoretical computer science, algorithmic complexity is formally categorized into complexity classes. Class P problems include all algorithms that can be solved in a polynomial running time function of the problem size using some deterministic computation machine while class NP defines problems solvable in polynomial running time of problem size on a nondeterministic machine. An important problem is to solve NP problems as fast as possible on deterministic compute architectures. Traditional deterministic compute architectures can definitively solve NP problems in an exponential running time function of problem size by unrolling all nondeterministic computation branches into a flattened operation sequence. (The sun of all polynomials is an exponential.) However, if the number of non-deterministic branches does not exceed memory-compute availability, even a parallel deterministic compute architecture can behave like a nondeterministic one. When compute architecture size is sufficiently balanced against problem size, class NP problems effectively fit within class P. This reduces EXAMPLES OF NP PROBLEMS THAT BECOME SIMPLE. It holds the key to efficient compiler optimization, recognizing intractable problems, directing novel proof sequences, and finding the minimal movement solution to map structured data onto flat memory. In the language of computation, neurons, cortical columns, and other state-tracking structures of the brain resemble parallel memory-compute components of a nondeterministic computational machine. While they usually operate independently, when any of these structures recognize a low energy state, Todo

This may allow the brain to SUMMARIZE GREAT SIGNIFICANCE

Facilitating the brain as it creates

The brain creates information by evolving thoughts

The brain hosts these advanced neuronal processes in a fast moving

The brain runs a fast-moving economy, and energy -- in the form of voltage- and ion-gradients, ATP, glucose, etc. -- is its currency. Consuming 20% of the body’s energy in a 1.4 kg organ, it is critical that energy supply nearly precisely matches its demand. Sustained divergence in absence or excess are both physiologically stressful, and

It mission critical role in adaptive information processing demands timing

Ion-gradient-expensive action potentials are more often associated with sensory violations, cognitive involvement, physical exertion, or otherwise meaningful information transfer than with uninvolved default-mode-activity, and the connectome that mediates their transfer also exhibits highly reconfigurable small-world connectivity. Always optimizing, the brain restructures neuronal connectivity, reorganizes their function, rearranges, shrinks, and expands energy filled vessels, estimates reward, and otherwise ‘thinks’ with the implicit objective of maintaining equilibrium between energy production and consumption. [neuronal free energy principle] However homeostasis is lost, the brain must adapt anatomically, physiologically, cognitively, and behaviorally to counteract its energy stress. In neurons, for instance, repeated presynaptic signal propagation possesses a stressor by depleting energy stored up in ion- and voltage-gradients. The overall network too cannot afford to be hyperactive as it already consumes 20% of the body’s energy when even sparse subsets of neurons are active. Emergent behavior is likewise observed at a ‘Goldilocks Zone’ in environments that are neither too predictable nor uncertain.

Well these statics problems may be solved in a trivial solution, there brain adapts X Help

Neurons adapt presynaptic signal induced stress by 1) immediately producing an action potential to restore the overall ion-gradient and 2) over repeated times raising synaptic junction resistivity. These adaptive mechanisms and others operating on faster and slower time scales attenuate each neuron to thousands of incoming signals such that those arriving at times and frequencies anticipated by the cellular energy economy do not invoke an action potential as often as those at unexpected times and rates. Action potentials, in turn, provide a blank slate against which larger signal violations are compared, and when those action potentials emanate from inhibitory neurons, they filter the propagation of signal violations on a higher plane. [Rhythms of the Brain] This hierarchy continues forming the irrational spectrum of alpha, beta, delta, gamma, etc. local and global scale oscillations. [Brain From Inside Out] Given their irrational spacing ($e^2 f_{alpha} \approx f_{beta} e \approx \f_{delta}$ DOUBLE CHECK), these fourier mode frequencies provide strong biases for oscillating with a chaotic environment. [Rhythms of the Brain]