0% found this document useful (0 votes)
16 views

Positioning Is All You Need

The document proposes that the brain uses an attention mechanism and positioning strategy rather than reconstruction to navigate the world. It suggests the hippocampus acts as an index to position sensorimotor maps generated by the neocortex in a global coordinate system, allowing navigation with only local information. This localized embodied cognition approach is modeled mathematically and provides a novel framework for understanding flexible hippocampal-neocortical systems.

Uploaded by

OBXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Positioning Is All You Need

The document proposes that the brain uses an attention mechanism and positioning strategy rather than reconstruction to navigate the world. It suggests the hippocampus acts as an index to position sensorimotor maps generated by the neocortex in a global coordinate system, allowing navigation with only local information. This localized embodied cognition approach is modeled mathematically and provides a novel framework for understanding flexible hippocampal-neocortical systems.

Uploaded by

OBXO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Positioning is All You Need

Xin Li
arXiv:2404.01183v1 [q-bio.NC] 1 Apr 2024

Department of Computer Science, University at Albany, Albany NY 12222


E-mail: xli48@albany.edu.
Apr. 1, 2024

Abstract

One can drive around safely using a GPS without memorizing a world map
(not to mention the dark regions that humans have never explored) because
we only pay attention to the next instruction of turning. Such simple observa-

tion with attention mechanism has a profound implication on our understand-


ing of how the brain works. The nonlinear effects generated by inhibitory
networks have long been known to provide a “winner-take-all” mechanism

for localized attention. This attention mechanism, at the system level, imple-
ments the selection/positioning operation needed by the hippocampus to pro-
vide spatio-temporal context to the neocortex. The localized receptive field of

sensory organs, when inspected under the framework of embodied cognition,


turns from a constraint to an advantage. Since we only need access to a local
map, positioning works better than reconstruction. Such simple intuition im-

plies that the attention mechanism is all you need to understand the flexible
behavior generated by hippocampal-neocortical systems. Geometrically, we
present a novel manifold positioning framework to explain the principle of lo-

1
calized embodied cognition. Through the co-evolution of the hippocampus and
the neocortex, the positioning operation implemented by the attention mecha-

nism can be interpreted as a nonlinear projection linking the discovery of local


subspace structure by the neocortex (a sensorimotor machine generating the
world map) with the navigation task in mind and without discovering global

manifold topology by the hippocampus (the neocortex’s map index).

Introduction

The sensory organs attached to mammalian brains can access only a localized region in space
at any specific time. Such spatio-temporal locality of sensory observation has a profound im-
plication for the mental construction of the internal world (a.k.a., internal representation (1))

including consciousness by the mammalian brain. Nature has discovered a clever solution to
learn a global representation of the physical world with a local receptive field of sensory cortex
by flexible behavior including spatial navigation. How do mammalian brains encode the spa-

tial information about the physical world into the population of spike sequence in the temporal
domain? Such a question is at the holy grail of neuroscience whose importance to cognitive
science is equivalent to the relativity theory in modern physics.

To formulate the above problem geometrically, we note that the observation data acquired
by localized sensory organs can be abstracted by the projection of a manifold embedded in a

high-dimensional to a low-dimensional subspace. It is well recognized that nonlinear manifold,


despite the sophisticated topology globally, is locally isomorphic to a Euclidean space. Such
fundamental property has been the inspiration for the class of nonlinear dimensionality reduc-

tion techniques (e.g., IsoMAP (2) and locally linear embedding (3)). An important new insight
brought by this work is that we do not need to reconstruct the global manifold but only need
to know the position within the global coordinates for goal-oriented behavior. This insight is

2
in alignment with intelligence without representation (1) and animate vision (4) which attempt
to put perception and action on an equal footing. Using GPS as the metaphor, we only need

access to a local map without the need to memorize the global map anywhere and anytime.
Such locality principle, when combined with embodied cognition, is a fundamental advantage
that has been exploited by nature during the evolution of mammalian brains.

Through co-evolution of the hippocampus and the neocortex, nature solves the manifold
positioning problem by the simple attention mechanism (5), which can be mathematically mod-
eled by context (6) and computationally implemented by the transformer architecture (7). At

the level of individual neurons, attention or selection of firing patterns is implemented by the
nonlinear effects of inhibitory networks (8). At the level of cortical columns, canonical circuits
implementing predictive processing (9) are responsible for context-dependent modulation of

prediction errors. At the level of cortical areas, hippocampal indexing theory (10,11) appears to
be the most comprehensive treatment of modeling hippocampal-neocortical interactions (12).
By serving as the neocortex’s librarian (pp. 285, (8), the hippocampus uses an indexing-based

positioning strategy to dynamically stabilize the sensorimotor activities generated by the neo-
cortex.
In this paper, we provide a geometric interpretation of the positioning/indexing strategy

underlying hippocampal-neocortical systems. It is shown that the problem of navigating on an


unknown manifold in an arbitrary dimension can be solved by “think globally, fit locally” (13)
- i.e., to exploit the local subspace constraint of a nonlinear manifold, the only solution that is

universal and generalizable is to explicitly learn a nonlinear mapping that connects the local
geometry with the global topology. Since we are not interested in manifold reconstruction
(e.g., the dark areas of the world map), an indexing or positioning strategy is sufficient for

“navigating” from one place to another. This intuition is consistent with “using the world as
its own model” (1) but we are taking one step forward by noting that we only need a localized

3
version of the world model due to the locality constraint of our sensory organs. Using the
map analogy from a thousand brain theory (14), we need a librarian (i.e., the hippocampus)

to maintain efficient indexing and retrieval of 150,000 maps generated by the neocortex. This
librarian can easily manage more maps as the intelligence evolves - note that the neocortex has a
virtually infinite capacity thanks to its small-world network organization (8); while the network

organization of the hippocampus does not change much from mammals to humans.

Toy Example

To facilitate the illustration of our core idea, we introduce a toy example based on marrying
place cells as discovered in the hippocampus (15) with the two-spiral dataset as widely used
in manifold learning literature (13), as shown in Fig. 1. Even though the maze experiments

used in studying rodent models have not used such a spiral setting (16), we think it is plausi-
ble to assume that mammalian brains such as rodents can easily manage a maze with a spiral
shape. The deeper implication of this toy example is that if place cells are interpreted as spatial

context, we can generalize this hypothesized experiment to higher dimensions. For example, if
object recognition is formulated as a maze problem in a higher-dimensional space, how can we
despositionign an intelligence machine for navigation?

In conventional wisdom such as IsoMAP (2) and locally linear embedding (LLE) (3), the
objective is to exploit the local subspace constraint of the manifold by nonlinear dimensionality

reduction. The drawback of this class of solutions is that they do not generalize well to arbi-
trary manifolds, especially in higher dimensions. The fundamental reason behind this failure
can be explained by intelligence without representation (1) - under the framework of embodied

cognition (17), perception and action have an equal footing. From the evolutionary perspec-
tive, goal-oriented behavior is the foundation of natural intelligence. Therefore, navigation on
the manifold is easier than reconstruction of the manifold; and nature prefers simple but not

4
a) b)

Figure 1: The place cells discovered in the hippocampus provide spatial context for the task of
navigation. Such contextual modeling can be exploited to navigate the local Euclidean space of
an arbitrary manifold by “memorizing” its position in the global coordinates.
simpler solutions. A moment of thought can tell that a place cell-based navigation strategy can

be easily generalized to a manifold of arbitrary dimension or topology. The job of indexing or


map-keeping will become more involved but it is still manageable by a librarian.

Computational Modeling of Hippocampal-Neocortex Systems

We advocate a new conceptual framework called “localized embodied cognition” with the lo-

cality principle in mind from problem formulation to solution algorithms. We still use the world
“as its own model”, but the way local embodied cognition never attempts to solve a global prob-
lem locally. An organism needs to couple sensory with motor systems to achieve various goals

but never stores knowledge about the global environment. Instead, the bodily interactions with
the environment (situatedness) are persistently maintained by the neocortex (a single book)’s
interaction with the hippocampus (a large library). The cognitive map in the hippocampus of-

fers a situational context to interpret the sensorimotor interaction occurring in the neocortex.
In this section, we first present computational models for characterizing indexing operation in

5
the hippocampus and sensorimotor interaction in the neocortex and then connect them with a
universal attention mechanism.

a) b)

Figure 2: Computational modeling of hippocampal-neocortical coupling.

Hippocampal Indexing Theory. The key idea underlying hippocampal indexing theory (10,
11) is that the hippocampus is “functional designed and anatomically situated” to handle the
diverse neocortical activities generated by sensorimotor interactions. By projecting back to the

neocortex, the hippocampus stores a collection of indexes that can serve as the context for the
storage and retrieval of episodic memory (18). Recent findings indicate a more complex inter-
action between hippocampal functioning and episodic memory, with an emphasis on the cog-

nitive map interpretation (19). This newer perspective considers the hippocampus as creating
spatial maps (cognitive maps (20)) for memory formation. The theory also recognizes the hip-
pocampus’s role in binding different elements of an experience, including time and space, into

a cohesive memory (21). Furthermore, the relationship between place cells in the hippocampus
and memory formation suggests that these cells may function both as part of a cognitive map
and as index cells for memory (22).

6
Neocortex as a Sensorimotor Machine. One of the biggest mysteries in studying cortical
columns is that they are structures without a function (23). Instead, several canonical microcir-

cuits such as Douglas-Martin model (24) and Haeusler-Maass model (25) have been studied for
predictive coding in the literature. These models have revealed a remarkable correspondence
between the microcircuitry of the cortical column and the connectivity required by predictive

coding. In Douglas-Martin model (24), information flow through the cortical column is charac-
terized by a stereotypical pattern of fast excitation followed by slower and longer-lasting inhibi-
tion in the cat visual system. The three neuronal populations receive thalamic drive and amplify

transit thalamic inputs to generate sustained activities while maintaining a balance between ex-
citation and inhibition. This model was further extended by Haeusler and Maass in (25), which
closely resembles the canonical circuits required by predictive coding. The feedforward predic-

tion errors from a lower cortical level arrive at granular layers and are passed to excitatory and
inhibitory interneurons in supragranular layers, encoding expectations. Meanwhile, the connec-
tions between excitatory and inhibitory neurons in supragranular layers enable deep pyramidal

cells and excitatory interneurons to generate context-dependent feedback predictions, which


descend to a lower hierarchical level.

Hippcampal-Neocortical Interaction . The coupling of the hippocampus with the neocortex


is based on the principle of localized embodied cognition (please refer to Fig. 2). Following
the spirit of intelligence without representation (1), we use the localized world as its own model

which nicely fits the functional design of the neocortex (i.e., detecting regularity in stimuli by
sensorimotor interaction). To accommodate the constantly changing properties of the global
world model, an organism needs a cognitive map for navigation. This map can be mathemati-

cally abstracted by a nonlinear mapping between local subspace geometry as discovered by the
neocortex and global topology as maintained by the hippocampus. Instead of reconstructing

7
the entire manifold, we argue it is sufficient to retrieve its local projection based on the context
information. This nonlinear mapping is essentially an attention mechanism that can be imple-

mented by a sparse and distributed memory (SDM) (26) (attention approximates SDM (27))
or an associative memory (e.g., modern Hopfield network (28) is intrinsically connected with
the transformer architecture (7)). SDM-based indexing/positioning supports continual learn-

ing (29). Real-world map-based navigation serves as a perfect example of the plausibility of
this manifold positioning strategy. Why can we safely drive around using GPS without memo-
rizing the entire world? Because at any time and location, we only need access to a localized

version of the world map for navigation purposes. Note that there is no need to reconstruct a
map for an area if no one ever goes there, which is consistent with goal-oriented behavior in
natural intelligence.

Applications into Cognitive Science

In this section, we revisit four widely studied cognitive tasks from the new perspective of lo-

cally navigating on a globally unknown manifold. The unifying theme is that positioning is
all you need to understand how the human brain solves these problems by a universal cortical
processing algorithm, as envisaged by Vernon Mountcastle in 1957 (30).

Spatial Navigation. Cognitive maps organize knowledge to support flexible and adaptive be-

havior. Among various behavioral tasks, spatial navigation is among the most well-studied in
the literature (19, 20). How do mammalian brains encode 2D physical space information into
1D spike sequence? This question is at the core of understanding the relationship between

space and time in its simplest form. The representation of spatial information (Euclidean or
non-Euclidean) plays an essential role in the building of cognitive maps. A graph-based rep-
resentation has been widely adopted for encoding the physical space into action-related state

8
space. A fundamental problem is the aliasing problem because identical sensory observations
can occur in different spatial locations (20). The aliasing problem is solved by attention in the

hippocampus where the encoding of spatio-temporal context (index to the local map or position
on the global map) is handled by CA3-CA1 collateral system (pp. 326 (8)). The consolidation
of semantic memory (analogy to global manifold topology) from episodic memory (analogy to

local Euclidean geometry) eliminates the spatio-temporal context to support context-dependent


inference where context is defined by the attention window associated with different spatial and
temporal locations.

Motor Control. Motor control refers to the process of physically interacting with objects
in the world and manipulating them toward completing a specific task/goal (e.g., reach and

grasp). In motor control and trajectory planning (31), internal models consist of: 1) a forward
component responsible for predicting sensory consequences of the executed motor commands;
and 2) an inverse dynamics model that calculates forward motor commands from the desired

trajectory information. Both components require an attention-guided and action-oriented body


map (32) that is centered on the controlled object. The generalization property of motor controls
lies in the transformation from one coordinate to another. For example, the task of grasping

a water bottle differs from that of grasping a coffee cup at the surface; but the principle of
motor commands is the same except for the change of coordinates (from egocentric for spatial
navigation to allocentric for motor control). Using the library as the metaphor again, what the

hippocampus needs to adapt is to re − position the neocortex - i.e., from one book of grasping
without a handle to another with a handle. Unlike navigation, object manipulation does not
require path integration for trajectory planning because the scale of motor control is the same

as that of the reference frame centered on the object.

9
Object Detection. In the spatial navigation of rodents, the environment or place learning
is achieved by ego-motion of animals, which dynamically changes the bearing and sketch

maps (33). Such behavior becomes more sophisticated as the evolution of the visual cortex
helps locate salient objects (a.k.a. landmarks) in the environment (34). From a manifold po-
sitioning perspective, we argue that there is an appealing analogy between the integrated map

for spatial navigation and the shape description for animate vision (4). That is, the problem of
object/landmark detection can also be solved by the inferior temporal (IT) cortex (35) assuming
that the spatial-temporal context is provided by the hippocampus. Mathematically, a place cell

can be abstracted by a Dirac function δ(x, y) that only fires when the organism is in the neigh-
borhood of (x, y). By analogy, a gnostic cell (a.k.a. concept cell (36,37) can fire if and only if an
object is detected by the neocortex in a latent space (e.g., generalized Hough transform (38)).

The spatial attention mechanism implemented by the gaze becomes critically important here
because it reduces the computational burden by indexical reference (4). More importantly, po-
sitioning of the gaze direction is naturally required by the visual cortex to recognize an object

based on the detection result (39).

Language Comprehension. Action-oriented representation facilitates understanding the con-

stitution between language and action in the theory of embodied language (40). By restoring
the central role played by the motor system in language processing, we can gain deeper insight
into the essence of language from a motor resonance or action simulation perspective (41). The

class of referential motor resonance is the source of abstract thoughts or covert movements in
our mental world, which are hierarchically organized. The relationship between hierarchical
motor control (42) and the nested structure of natural languages (43) can be better understood

from a hierarchical extension of the attention perspective. Both language and motor control
demonstrate compositionality, whereby complex structures are built from simpler elements ac-

10
cording to combinatorial rules. In natural language, sentences are composed of phrases com-
posed of smaller units such as words and morphemes. This syntactic structure manifests as the

ability to generate an infinite variety of sentences from a finite set of words and grammatical
rules. Similarly, in motor control, movements can be decomposed into smaller units, such as
motor primitives, which are organized hierarchically to generate complex behaviors. More-

over, movements can comprise sequences of simpler actions or motor primitives, combined
to produce more complex behaviors. The latest study on relating neural representation of the
hippocampal formation (e.g., Tolman-Eichenbaum Machine (19)) to transformer (44) seems to

support that attention is needed for both spatial navigation and language comprehension. The
curios position formula of tokens in the transformer architecture using sine and cosine func-
tions seems the de-aliasing result that is conceptually similar to the omnidirectional place cells

in cognitive maps (pp. 330, (8)).

References and Notes

1. R. A. Brooks, Artificial intelligence 47, 139 (1991).

2. J. B. Tenenbaum, V. d. Silva, J. C. Langford, science 290, 2319 (2000).

3. S. T. Roweis, L. K. Saul, science 290, 2323 (2000).

4. D. H. Ballard, Artificial Intelligence 48, 57 (1991).

5. M. I. Posner, C. D. Gilbert, Proceedings of the National Academy of Sciences 96, 2585


(1999).

6. N. Cesa-Bianchi, G. Lugosi, Prediction, learning, and games (Cambridge university press,


2006).

7. A. Vaswani, et al., Advances in neural information processing systems 30 (2017).

11
8. G. Buzsáki, Rhythms of the Brain (Oxford university press, 2006).

9. G. B. Keller, T. D. Mrsic-Flogel, Neuron 100, 424 (2018).

10. T. J. Teyler, P. DiScenna, Behavioral neuroscience 100, 147 (1986).

11. T. J. Teyler, J. W. Rudy, Hippocampus 17, 1158 (2007).

12. J. L. McClelland, B. L. McNaughton, R. C. O’Reilly, Psychological review 102, 419


(1995).

13. L. K. Saul, S. T. Roweis (2003).

14. J. Hawkins, A thousand brains: A new theory of intelligence (Hachette UK, 2021).

15. J. O’keefe, L. Nadel, The hippocampus as a cognitive map (Oxford university press, 1978).

16. E. C. Tolman, Psychological review 55, 189 (1948).

17. M. Wilson, Psychonomic bulletin & review 9, 625 (2002).

18. E. Tulving, Annual review of psychology 53, 1 (2002).

19. J. C. Whittington, et al., Cell 183, 1249 (2020).

20. J. C. Whittington, D. McCaffary, J. J. Bakermans, T. E. Behrens, Nature Neuroscience 25,


1257 (2022).

21. B. P. Staresina, L. Davachi, Neuron 63, 267 (2009).

22. M.-B. Moser, D. C. Rowland, E. I. Moser, Cold Spring Harbor perspectives in biology 7,

a021808 (2015).

12
23. J. C. Horton, D. L. Adams, Philosophical Transactions of the Royal Society B: Biological
Sciences 360, 837 (2005).

24. R. J. Douglas, K. Martin, The Journal of physiology 440, 735 (1991).

25. S. Haeusler, W. Maass, Cerebral cortex 17, 149 (2007).

26. P. Kanerva, Sparse distributed memory (MIT press, 1988).

27. T. Bricken, C. Pehlevan, Advances in Neural Information Processing Systems 34, 15301

(2021).

28. H. Ramsauer, et al., arXiv preprint arXiv:2008.02217 (2020).

29. T. Bricken, X. Davies, D. Singh, D. Krotov, G. Kreiman, arXiv preprint arXiv:2303.11934

(2023).

30. V. B. Mountcastle, Journal of neurophysiology 20, 408 (1957).

31. M. Kawato, Current opinion in neurobiology 9, 718 (1999).

32. E. M. Gordon, et al., Nature 617, 351 (2023).

33. L. F. Jacobs, F. Schenk, Psychological review 110, 285 (2003).

34. P. S. Churchland, V. S. Ramachandran, T. J. Sejnowski (1993).

35. J. J. DiCarlo, D. Zoccolan, N. C. Rust, Neuron 73, 415 (2012).

36. R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, I. Fried, Nature 435, 1102 (2005).

37. R. Q. Quiroga, G. Kreiman, C. Koch, I. Fried, Trends in cognitive sciences 12, 87 (2008).

38. D. H. Ballard, Pattern recognition 13, 111 (1981).

13
39. D. H. Ballard, IJCAI (Citeseer, 1989), vol. 89, pp. 1635–1641.

40. M. H. Fischer, R. A. Zwaan, Quarterly journal of experimental psychology 61, 825 (2008).

41. R. A. Zwaan, L. J. Taylor, Journal of Experimental Psychology: General 135, 1 (2006).

42. J. Merel, M. Botvinick, G. Wayne, Nature communications 10, 5489 (2019).

43. N. Chomsky (1957).

44. J. C. Whittington, J. Warren, T. E. Behrens, International Conference on Learning Repre-


sentations (2022).

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy