Search Results for author: Kaya Stechly

Found 4 papers, 0 papers with code

Chain of Thoughtlessness: An Analysis of CoT in Planning

no code implementations • 8 May 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution.

Paper
Add Code

On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks

no code implementations • 12 Feb 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an iterative fashion.

Paper
Add Code

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

no code implementations • 2 Feb 2024 • Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.

Paper
Add Code

GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems

no code implementations • 19 Oct 2023 • Kaya Stechly, Matthew Marquez, Subbarao Kambhampati

The study seems to indicate that (i) LLMs are bad at solving graph coloring instances (ii) they are no better at verifying a solution--and thus are not effective in iterative modes with LLMs critiquing LLM-generated solutions (iii) the correctness and content of the criticisms--whether by LLMs or external solvers--seems largely irrelevant to the performance of iterative prompting.

Scheduling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.