Artificial Human Intelligence: The Programmer’s Apprentice Tom Dean and Rishabh Singh Google Research 1 In this presentation, the phrase “Artificial Human Intelligence” refers to AI systems that employ architectures modeled after the human brain, by leveraging ideas from developmental psychology and cognitive neuroscience. We start off with a discussion of the overall project with an emphasis on the underlying cognitive architecture, motivated in large part by the basic natural-language-processing and problem-solving skills required of an assistant in collaborating with a software engineer. Our primary objective is to build an end-to-end system for an individualized personal assistant that focuses on a specific area of expertise, namely software engineering, that learns from experience, works collaboratively with an expert programmer and that provides value from day one.
2 The Programmer’s Apprentice was the name of a project started by Charles Rich and Richard Waters at the MIT AI lab in 1987. The goal of the project was to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify and document programs, and, if possible, implement it. Their research plan was to build prototypes of the apprentice incrementally. Our research plan also involves making incremental steps. However, we will be able to make substantially larger steps by exploiting and contributing to the powerful AI technologies developed during the intervening 30 years with our primary focus on recent advances in applied machine learning and artificial neural networks.
Artificial Human Intelligence Cognitive and systems neuroscience provide clues to engineers interested in exploiting what we know concerning how humans think about and solve problems. Fundamental to our understanding of human cognition is the essential tradeoff between fast, highly-parallel, context-sensitive, distributed connectionist-style computations and slow, serial, systematic, combinatorial symbolic computations. 3 The fields of cognitive and systems neuroscience provide clues to engineers interested in applying what we’ve learned about how humans communicate with one another to solve problems. Fundamental to our understanding of human cognition is the tradeoff between fast, highly-parallel, context-sensitive connectionist-style models and slow, essentially-serial, combinatorial symbolic models. Human intelligence is considered to be a hybrid of these two complementary computational strategies.
Artificial Human Intelligence Our goal in developing systems that incorporate characteristics of human intelligence is two fold: humans provide a complete solution that we can use as a basic blueprint and then improve upon, and the resulting AI systems are likely to be well suited to developing assistants that complement and extend human intelligence while operating in a manner comprehensible to our understanding. 4 The study of human cognitive function at the systems level primarily consists of human subjects performing cognitive tasks in an fMRI scanner that measures brain activity reflected in changes associated with blood flow. These studies localize cognitive activity in space and time to construct cognitive models by measuring the correlation between regions of observed brain activity and the steps carried out by subjects in solving problems. In addition to localizing brain activity correlated with cognitive functions, Diffusion Tensor Imaging (DTI) can be used for tractographic reconstructions to infer white matter connections between putative functional regions.
Conscious Attention & Short Term Memory 5 There is a tendency to think the neocortex is the epitome of human cognition. The evolution of the human cerebral cortex — or neocortex — occurred in parallel with substantial enhancements in the cerebellar cortex and many subcortical areas including the basal ganglia, thalamus and hippocampus. In the following, we will be primarily concerned with circuits that involve the cortex, hippocampus and basal ganglia. We begin with a straightforward interpretation of conscious attention developed by Stanislas Dehaene and his colleagues at Collège de France in Paris that provides a basis for short-term memory and executive function in the prefrontal cortex.
Global Workspace Theory Global Workspace Model Bottom-up Propagation Global Workspace Activation Feed-back Connections Feed-forward Connections Thalamo-cortical Column Stanislas Dehaene. Consciousness and the Brain: Deciphering How the Brain Codes Our Thoughts . Viking Press, 2014. Figure 27 6 In the Global Workspace Theory developed by Bernard Baars and extended by Dehaene, sensory data is initially processed in the primary sensory areas located in posterior cortex, propagates forward and is further processed in increasingly-abstract multi-modal association areas. Even as information flows forward toward the front of the brain, the results of abstract computations performed in the association areas are fed back toward the primary sensory cortex. This basic pattern of activity is common in all mammals.
7 We have developed a simple architecture that represents the apprentice’s global workspace and incorporates a model of attention that surveys activity throughout somatosensory and motor cortex, identifies the activity relevant to the current focus of attention and then maintains this state of activity so that it can readily be utilized in problem solving. In the case of the apprentice, new information is ingested into the model at the system interface, including dialog in the form of text, visual information in the form of editor screen images, and a collection of programming-related signals originating from a fully-instrumented integrated development environment called FIDE. Single-modality sensory information then feeds into multi-modal association areas to create rich abstract representations. Attentional networks in the prefrontal cortex take as input activations occurring throughout the posterior cortex. These networks are trained by reinforcement learning to identify areas worth attending to and the resulting policy selects specific areas to actively maintain in short-term memory. In keeping with a model suggested by Yoshua Bengio, this attentional process is guided by a prior that prefers relatively-low-dimensional abstract thought vectors corresponding information useful for making decisions. While humans can sustain only a few such activations at a time, the apprentice has no such limitations.
Episodic Memory & Long Term Memory 8 It is said that each of us is defined by a collection of episodic memories, a set of expectations about the future and a single thread of control directed by conscious attention. Whether or not this is an accurate characterization of a human being, it is our working model for the programmer’s apprentice. Long-term memory storage and retrieval in humans involves most areas of the neocortex plus several subcortical circuits, the most important being the hippocampal formation in the medial temporal lobe of the brain.
9 On the above left you see a cartoon drawing of the hippocampus and related cortical and subcortical areas. The primary components include the entorhinal cortex or EHC, the dentate gyrus or DG and two hippocampal nuclei referred to as CA3 and CA1. The graphic in the bottom center is an anatomically more accurate artistic rendering showing the entorhinal cortex on the top right with green neural processes, the dentate gyrus in the top center with blue processes projecting to CA3 in the hippocampus, from which purple processes project to CA1 and from there back to the entorhinal cortex to complete a recurrent loop essential for storage and retrieval. The block diagram in the upper right summarizes the component circuits, along with their projections and reciprocal connections. In the process of forming new memories, a stimulus corresponding to compressed summaries of neural activity occurring throughout the cortex is projected onto the entorhinal cortex. This composite pattern of activity is sent to the dentate gyrus where it undergoes extreme pattern separation by projecting the high-dimensional composite pattern of activity from the EHC onto a lower dimensional representation in dentate gyrus that is fowarded to CA3. CA3 is an auto-associative network that incorporates each new composite pattern of activity to construct an index that can be used to recover the stored memory from the original stimulus.
The auto-associative network performs pattern completion enabling the hippocampus to reconstruct an appropriate index from a partial pattern consisting of a subset of the activity in the original stimulus. During retrieval, the index reconstructed from a stimulus pattern of activity is used to recover the pattern of activity of the original stimulus in CA1 that is projected back to the entorhinal cortex which uses it to reactivate the original pattern of activity in the cortex using its reciprocal connections. The result is not a perfect reconstruction of the original state of the cortex at the time the memory was recorded, but rather a potentially more relevant version of the original state that incorporates information from the current state in which the memory is reconstructed.
Recommend
More recommend