Markets

A New Neural Memory Trick Helps AI Handle Much Longer Sequences

Authors:

(1) Hung Le, Applied AI Institute, Deakin University, Geelong, Australia;

(2) Dung Nguyen, Applied AI Institute, Deakin University, Geelong, Australia;

(3) Kien Do, Applied AI Institute, Deakin University, Geelong, Australia;

(4) Svetha Venkatesh, Applied AI Institute, Deakin University, Geelong, Australia;

(5) Truyen Tran, Applied AI Institute, Deakin University, Geelong, Australia.

Abstract & Introduction

Methods

Methods Part 2

Experimental Results

Experimental Results Part 2

Related Works, Discussion, & References

Appendix A, B, & C

Appendix D

Abstract

We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM’s exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks.

1. Introduction

Systematic generalization underpins intelligence, and it relies on the ability to recognize abstract rules, extrapolating them to novel contexts that are distinct yet semantically similar to the seen data. Current neural networks or statistical machine learning fall short of handling novel data generated by symbolic rules even though they have achieved state-of-the-art results in various domains. Some approaches can show decent generalization for single or set input data [Bahdanau et al., 2018, Gao et al., 2020, Webb et al., 2020]. Yet, neural networks in general still fail in sequential symbol processing tasks, even with slight novelty during inference [Lake and Baroni, 2018, Del´etang et al., 2022]. For instance, these models can easily learn to duplicate sequences of 10 items, but they will fail to copy sequences of 20 items if they were not part of the training data. These models overfit the training data and perform poorly on out-of-distribution samples such as sequences of greater length or sequences with novel compositions. The issue also affects big models like Large Language Models, making them struggle with symbolic manipulation tasks [Qian et al., 2023]. This indicates that current methods lack a principled mechanism for systematic generalization.

From a neuroscience perspective, it has been suggested that the brain can execute symbol processing through variable binding and neural pointers, wherein the sensory data are conceptualized into symbols that can be assigned arbitrary values [Kriete et al., 2013]. Like the brain, computer programs excel at symbolic computations. Programmers use address pointers to dynamically access data or programs, and have flexible control over the variable. Their programs can work appropriately with unseen inputs.

Building on these insights, we propose a pointer-based mechanism to enhance generalization to unseen length in sequence prediction, which is a crucial problem that unifies all computable problems [Solomonoff, 2010]. Our mechanism is based on two principles: (I) explicitly modeling pointers as physical addresses, and (II) strictly isolating pointer manipulation from input data. As such, we need

to design a memory that supports physical pointers, and create a model that manipulates the pointers to perform abstract rules and access to the memory. Our memory, dubbed Pointer-Augmented Neural Memory (PANM), is slot-based RAM [Von Neumann, 1993] where each memory slot consists of two components: data and address. Unlike initial endeavors that implicitly model pointers as attention softmax [Vinyals et al., 2015, Kurach et al., 2015, Le et al., 2018, Khan et al., 2021], our addresses are generated to explicitly simulate physical memory addresses, i.e., incremental binary numbers, which is critical for generalization to longer sequences.

To manipulate a pointer, we create an address bank that contains physical addresses corresponding to the input sequence, and use a neural network called Pointer Unit that is responsible for transforming pointers from an initial address in the address bank. Through attention to the address bank, a new pointer is generated as a mixture of the physical addresses, which can point to different memory slots to follow the logic of the task. We aim to let the Pointer Unit learn the symbolic rules of the task in an end-to-end manner. Finally, given a (manipulated) pointer, the model can access the data through 2 modes of pointer-based access: pointer dereference (Mode-1) and relational access (Mode-2). Our memory can be plugged into common encoder-decoder backbones such as LSTM or Transformer.

Our contribution is a novel memory architecture that incorporates explicit pointer and symbol processing, working seamlessly with sequential models to generalize better. We examine our model in symbol-processing domains such as algorithms and context-free grammar where PANM effectively works with LSTM and StackRNN. We apply PANM to improve the generalization of Transformer models on compositional learning, using SCAN and mathematics datasets. Also, we observe PANM’s superior performance in more realistic question answering and machine translation tasks. Our focus is not on striving for state-of-the-art results requiring specialized designs tailored to specific tasks. Our objective is to highlight the generalization improvement achieved by integrating our memory module into fundamental sequential models, with minimal architectural changes, and showcase the importance of using fundamental generalizing principles to address limitations of current deep learning.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button