no code implementations • 8 May 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution.
no code implementations • 12 Feb 2024 • Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
While the initial optimism that reasoning might emerge automatically with scale has been tempered thanks to a slew of counterexamples--ranging from multiplication to simple planning--there persists a wide spread belief that LLMs can self-critique and improve their own solutions in an iterative fashion.
no code implementations • 2 Feb 2024 • Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.
no code implementations • 12 Oct 2023 • Karthik Valmeekam, Matthew Marquez, Subbarao Kambhampati
We evaluate a planning system that employs LLMs for both plan generation and verification.
2 code implementations • 25 May 2023 • Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati
We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers.
no code implementations • 13 Feb 2023 • Karthik Valmeekam, Sarath Sreedharan, Matthew Marquez, Alberto Olmo, Subbarao Kambhampati
On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop.
no code implementations • 28 Oct 2022 • Lin Guan, Karthik Valmeekam, Subbarao Kambhampati
We propose two practical methods that can learn to model any kind of behavioral attributes from ordered behavior clips.
2 code implementations • NeurIPS 2023 • Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati
PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
no code implementations • 19 Nov 2020 • Karthik Valmeekam, Sarath Sreedharan, Sailik Sengupta, Subbarao Kambhampati
Decision support systems seek to enable informed decision-making.