{"id":3,"title":"LLM Grounding - MazeGPT","independent":true,"description":"In this project I have evaluated the ability for a large language model (GPT 4 and GPT 3) to ground itself in a virtual environment. For this experiment, the environment is a 2D block maze with walls, a single goal, and a player character. The player, which is the LLM, receives feedback from its environment at the beginning and after every move. This feedback is in the form of a description of the contents of the 4 surrounding blocks of the player character. Then, the LLM selects a move from U, D, L, R via text response, like a text-based adventure.\r\n\r\nIn the first attempt, I used two chatbots. One chatbot is the thinker, and one is the mover. Both bots have access to the entire message history. The thinker uses this history to generate a \"thought\" about it's current situation such as where it's been and where it should go next. This thought is passed to the mover, which then selects a singular move such as U, D, L, R. The results of this setup were mediocre, especially with GPT 3. GPT 3 routinely got stuck in loops, and rarely managed to move more than a few spaces. When upgraded to GPT 4, however, the chatbot managed to navigate simple mazes, reaching the goal the majority of the time.\r\n\r\nIn an attempt to improve model performance, I built the \"Triple Thinker\". This version added a third chatbot - the conceptualizer. The purpose of the conceptualizer chatbot was to build out and maintain a mental model of the maze using ASCII, which was passed to the thinker to generate a thought on its situation, which was passed to the mover to select a move. Unfortunately, the addition of this conceptualizer did not yield improvements due to the inability for GPT 4 to maintain a consistent map of its environment.\r\n\r\nWith additional prompt tuning it is likely that models of today could navigate small 4x4 or even 5x5 mazes of this environment, especially if the conceptualizer improves its ability to maintain a map. This is quite interesting as the large language models are simply doing next token prediction, and were not trained to be grounded in environmental interactions.","start":"2023-03-10","end":"2023-04-03","img":"https://www.youtube.com/embed/c6Bqb5Okdh8?mute=1&hd=1","link":"https://github.com/SevanBrodjian/MazeGPT","slug":"llm-grounding-mazegpt","topic":[1,2,4,5,6],"association":[]}