no code implementations • 8 May 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution.
no code implementations • 12 Feb 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an iterative fashion.
no code implementations • 2 Feb 2024 • Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.
no code implementations • 19 Oct 2023 • Kaya Stechly, Matthew Marquez, Subbarao Kambhampati
The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting.