Member Only Access
Summary
This study evaluates the reasoning depth of large language models (LLMs) using the 11-20 Money Request Game, an experimental game designed to test level-k reasoning. Level-k reasoning is a theoretical framework in game theory where individuals operate at varying levels of strategic thinking. The findings of the study reveal significant differences between the responses of LLMs and human participants, highlighting the limitations of using LLMs as human surrogates in behavioral experiments.
This research emphasizes the need for caution when interpreting LLM behavior as human-like, as the models often exhibit inconsistent and non-human-like reasoning patterns. The study suggests that while LLMs can provide valuable insights, they should not be relied upon as accurate simulations of human behavior.