LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning
SPRING is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning.
A group of researchers from Carnegie Mellon University, NVIDIA, Ariel University, and Microsoft have investigated the use of Large Language Models (LLMs) for understanding and reasoning with human knowledge in the context of games. They propose a two-stage approach called SPRING, which involves studying an academic paper and then using a Question-Answer (QA) framework to justify the knowledge obtained.
More details about SPRING
In the first stage, the authors read the LaTeX source code of the original paper by Hafner (2021) to extract prior knowledge. They employed an LLM to extract relevant information, including game mechanics and desirable behaviors documented in the paper. They then utilized a QA summarization framework similar to Wu et al. (2023) to generate QA dialogue based on the extracted knowledge, enabling SPRING to handle diverse contextual information.
The second stage focused on in-context chain-of-thought reasoning using LLMs to solve complex games. They constructed a directed acyclic graph (DAG) as a reasoning module, where questions are nodes and dependencies between questions are represented as edges. For example, the question “For each action, are the requirements met?” is linked to the question “What are the top 5 actions?” within the DAG, establishing a dependency from the latter question to the former.
LLM answers are computed for each node/question by traversing the DAG in topological order. The final node in the DAG represents the question about the best action to take, and the LLM’s answer is directly translated into an environmental action.
Experiments and Results
The Crafter Environment, introduced by Hafner (2021), is an open-world survival game with 22 achievements organized in a tech tree of depth 7. The game is represented as a grid world with top-down observations and a discrete action space consisting of 17 options. The observations also provide information about the player’s current inventory state, including health points, food, water, rest levels, and inventory items.
The authors compared SPRING and popular RL methods on the Crafter benchmark. Subsequently, experiments and analysis were carried out on different components of their architecture to examine the impact of each part on the in-context “reasoning” abilities of the LLM.
Source: https://arxiv.org/pdf/2305.15486.pdf
The authors compared the performance of various RL baselines to SPRING with GPT-4, conditioned on the environment paper by Hafner (2021). SPRING surpasses previous state-of-the-art (SOTA) methods by a significant margin, achieving an 88% relative improvement in-game score and a 5% improvement in reward compared to the best-performing RL method by Hafner et al. (2023).
Notably, SPRING leverages prior knowledge from reading the paper and requires zero training steps, while RL methods typically necessitate millions of training steps.
Source: https://arxiv.org/pdf/2305.15486.pdf
The above figure represents a plot of unlock rates for different tasks, comparing SPRING to popular RL baselines. SPRING, empowered by prior knowledge, outperforms RL methods by more than ten times on achievements such as “Make Stone Pickaxe,” “Make Stone Sword,” and “Collect Iron,” which are deeper in the tech tree (up to depth 5) and challenging to reach through random exploration.
Moreover, SPRING performs perfectly on achievements like “Eat Cow” and “Collect Drink.” At the same time, model-based RL frameworks like Dreamer-V3 have significantly lower unlock rates (over five times lower) for “Eat Cow” due to the challenge of reaching moving cows through random exploration. Importantly, SPRING does not take action “Place Stone” since it was not discussed as beneficial for the agent in the paper by Hafner (2021), even though it could be easily achieved through random exploration.
Limitations
One limitation of using an LLM for interacting with the environment is the need for object recognition and grounding. However, this limitation doesn’t exist in environments that provide accurate object information, such as contemporary games and virtual reality worlds. While pre-trained visual backbones struggle with games, they perform reasonably well in real-world-like environments. Recent advancements in visual-language models indicate potential for reliable solutions in visual-language understanding in the future.
Conclusion
In summary, the SPRING framework showcases the potential of Language Models (LLMs) for game understanding and reasoning. By leveraging prior knowledge from academic papers and employing in-context chain-of-thought reasoning, SPRING outperforms previous state-of-the-art methods on the Crafter benchmark, achieving substantial improvements in-game score and reward. The results highlight the power of LLMs in complex game tasks and suggest future advancements in visual-language models could address existing limitations, paving the way for reliable and generalizable solutions.
Check out the Paper. Don’t forget to join our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]
Check Out 100’s AI Tools in AI Tools Club
The post LLMs Outperform Reinforcement Learning- Meet SPRING: An Innovative Prompting Framework for LLMs Designed to Enable in-Context Chain-of-Thought Planning and Reasoning appeared first on MarkTechPost.