Crypto Trends

Neuro-Symbolic Reasoning Meets RL: EXPLORER Outperforms in Text-World Games

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

Abstract

Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER[1] which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neurosymbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games.

1 Introduction

Natural language plays a crucial role in human intelligence and cognition. To study and evaluate the process of language-informed sequential decision-making in AI agents, text-based games (TBGs) have emerged as important simulation environments, where the states and actions are usually described in natural language. To solve game instances, an agent needs to master both natural language processing (NLP) and reinforcement learning (RL). At a high level, existing RL agents for TBGs can be classified into two classes: (a) rule based agents, and (b) neural agents. Rule-based agents such as NAIL (Hausknecht et al., 2019)

Figure 1: An overview of the EXPLORER agent’s dataflow on a TWC game. In EXPLORER, the neural module is responsible for exploration and collects pairs, whereas the symbolic module learns the rules and does the exploitation using commonsense knowledge from WordNet.Figure 1: An overview of the EXPLORER agent’s dataflow on a TWC game. In EXPLORER, the neural module is responsible for exploration and collects pairs, whereas the symbolic module learns the rules and does the exploitation using commonsense knowledge from WordNet.

rely heavily on prior predefined knowledge. This makes them less flexible and adaptable. To overcome the challenges of rule-based agents, in recent years, with the advent of new deep learning techniques, significant progress has been made on neural agents (Narasimhan et al., 2015; Adhikari et al., 2020b). However, these frameworks also suffer from a number of shortcomings. First, from deep learning, they inherit the need for very large training sets, which entails that they learn slowly. Second, they are brittle in the sense that a trained network may show good performance with the entities that are seen in the training instances, yet it performs very poorly in a very similar environment with unseen entities. Additionally, the policies learned by these neural RL agents are not interpretable (human-readable).

In this paper, we introduce EXPLORER for TBGs that utilizes the positive aspects of both neural and symbolic agents. The EXPLORER is based on two modules – neural and symbolic, where the neural module is mainly responsible for exploration and the symbolic module does the exploitation. An overview of the EXPLORER agent can be found in Figure 1. A key advantage of EXPLORER is that it has a scalable design that can integrate any neural module and can build the symbolic module upon it. For the symbolic module, instead of using predefined prior knowledge, EXPLORER learns its symbolic policies by leveraging reward and action pairs while playing the game. These policies are represented using a declarative logic programming paradigm — Answer Set Programming (ASP) (Lifschitz, 2019), which allows the policies to be interpretable and explainable. Due to its non-monotonic nature and efficient knowledge representation ability, ASP has proven its efficiency in NLP research (Basu et al., 2020, 2021; Pendharkar et al., 2022; Zeng et al., 2024); Commonsense reasoning research (Gupta et al., 2023; Kothawade et al., 2021); and NLP + RL research (Lyu et al., 2019; Basu et al., 2022b; Sridharan et al., 2017; Mitra and Baral, 2015; Yang et al., 2018). We believe nonmonotonic reasoning (NMR) (Gelfond and Lifschitz, 1988; Reiter, 1988) is a crucial capability in partially observable worlds, as the agent’s beliefs can change in the presence of new information and examples. Importantly, with the help of an exception learner (illustrated in Section 3.2), EXPLORER learns the symbolic policies as default theories so that the agent can perform NMR, and the policies remain consistent with the agent’s findings.

After learning the symbolic policies, EXPLORER can lift or variablize the rules using WordNet (Miller, 1995) to generalize them. By generalizing the symbolic policies, we find that EXPLORER overcomes the challenge of getting poor performance over unseen entities or out-of-distribution (OOD) test sets, as the unseen objects are similar in nature to the training objects and occur under the same class in WordNet.

Figure 2 illustrates the components of our neurosymbolic architecture and shows an overview of the agent’s decision-making process.

We have used TW-cooking to verify our approach and then performed a comprehensive evaluation of EXPLORER on TWC games. To showcase the scalability aspects of EXPLORER, we

Figure 2: Overview of EXPLORER’s decision-making at any given time step. The Hybrid Neuro-Symbolic architecture mainly consists of 5 modules - (a) Context Encoder encodes the observation to dynamic context, (b) Action Encoder encodes the admissible actions, (c) Neural Action Selector combines (a) and (b) with L operator, (d) Symbolic Action Selector returns a set of candidate actions, and (e) Symbolic Rule Learner uses ILP and WordNet-based rule generalization to generate symbolic rules.Figure 2: Overview of EXPLORER’s decision-making at any given time step. The Hybrid Neuro-Symbolic architecture mainly consists of 5 modules - (a) Context Encoder encodes the observation to dynamic context, (b) Action Encoder encodes the admissible actions, (c) Neural Action Selector combines (a) and (b) with L operator, (d) Symbolic Action Selector returns a set of candidate actions, and (e) Symbolic Rule Learner uses ILP and WordNet-based rule generalization to generate symbolic rules.

have done comparative studies with other SOTA neural and neuro-symbolic models, and the empirical results demonstrate that EXPLORER outplays others by achieving better generalization over unseen entities. Due to the neuro-symbolic nature of EXPLORER, we are also able to perform detailed qualitative studies of the policies (illustrated in section – 5.3).

The main contributions of this paper are: (1) we present EXPLORER for TBGs that outperforms existing models in terms of steps and scores; (2) we discuss the importance of non-monotonic reasoning in partially observable worlds; (3) we demonstrate how default theories can be learned with exceptions in an online manner for TBGs; and (4) we provide a novel information-gain based rule generalization algorithm that leverages WordNet.


[1] Code available at: https://github.com/kinjalbasu/explorer

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button