Breaking Down Deductive Reasoning Errors in LLMs

This paper introduces the concept of validating each reasoning step in LLMs for QA tasks, focusing on deductive reasoning to improve accuracy. By ensuring the logical consistency of every step, even correct answers are scrutinized for hidden errors, enhancing overall model reliability.


This content originally appeared on HackerNoon and was authored by Cosmological thinking: time, space and universal causation

:::info Authors:

(1) Zhan Ling, UC San Diego and equal contribution;

(2) Yunhao Fang, UC San Diego and equal contribution;

(3) Xuanlin Li, UC San Diego;

(4) Zhiao Huang, UC San Diego;

(5) Mingu Lee, Qualcomm AI Research and Qualcomm AI Research

(6) Roland Memisevic, Qualcomm AI Research;

(7) Hao Su, UC San Diego.

:::

Abstract and Introduction

Related work

Motivation and Problem Formulation

Deductively Verifiable Chain-of-Thought Reasoning

Experiments

Limitations

Conclusion, Acknowledgements and References

\ A Deductive Verification with Vicuna Models

B More Discussion on Improvements of Deductive Verification Accuracy Versus Improvements on Final Answer Correctness

C More Details on Answer Extraction

D Prompts

E More Deductive Verification Examples

3 Motivation and Problem Formulation

\ We observe that for all cases where LLMs produce erroneous final answers, there exists at least one mistake among the intermediate reasoning steps S. Moreover, even when the final answer is correct, there might still exist some mistakes among S. This phenomenon, as illustrated in Tab. 1, occurs for all LLMs we tested, including state-of-the-art models such as ChatGPT and GPT-4 [32]. Since later reasoning steps are conditioned on prior reasoning steps, these mistakes often initiate a snowball effect, causing subsequent mistakes to compound. This significantly diminishes the likelihood of correct problem-solving and impedes the progress towards achieving human-level complex reasoning.

\ Therefore, in this work, we place significant emphasis on ensuring the validity of every reasoning step, not just the correctness of the final answer. In particular, we focus on the validity of deductive reasoning, an essential component of a logical reasoning process. In deductive reasoning, we are

\ Table 2: Zero-shot and two-shot reasoning chain verification accuracy for GPT-3.5-turbo (ChatGPT), where an entire reasoning chain is verified at once. The two shot prompt we used is presented in Appendix D.1. To generate verification inputs, for each dataset, we perform Chain-of-Thought (CoT) prompting and randomly sample 50 reasoning chains that are valid and 50 reasoning chains that exhibit mistakes. We observe that when given an entire reasoning process, where the deductive graphs for all reasoning steps are entangled together, it is challenging even for strong language models like ChatGPT to verify its validity.

\ given a (premise, conclusion) pair, and we are interested in determining whether the conclusion follows from the premises. In the context of reasoning-based QA tasks, for each reasoning step si , we define its deductive validity V (si) as a binary variable. A reasoning step is deductively valid (V (si) = 1) if and only if si can be logically deduced from its corresponding premises pi , which consist of the context C, the question Q, and all the previous reasoning steps sj (j < i). Then, we can also define the deductive validity for the entire reasoning chain S as V (S) = ∧M i=1V (si). Compared to evaluating answer correctness, which can be accomplished by simple functions such as exact string match, evaluating deductive validity is a lot more challenging. Thanks to the recent progress on LLMs, which demonstrate impressive in-context learning capabilities across diverse scenarios, we propose to use LLMs to examine reasoning chains and predict the deductive reasoning validity.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Cosmological thinking: time, space and universal causation


Print Share Comment Cite Upload Translate Updates
APA

Cosmological thinking: time, space and universal causation | Sciencx (2024-09-08T13:58:54+00:00) Breaking Down Deductive Reasoning Errors in LLMs. Retrieved from https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/

MLA
" » Breaking Down Deductive Reasoning Errors in LLMs." Cosmological thinking: time, space and universal causation | Sciencx - Sunday September 8, 2024, https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/
HARVARD
Cosmological thinking: time, space and universal causation | Sciencx Sunday September 8, 2024 » Breaking Down Deductive Reasoning Errors in LLMs., viewed ,<https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/>
VANCOUVER
Cosmological thinking: time, space and universal causation | Sciencx - » Breaking Down Deductive Reasoning Errors in LLMs. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/
CHICAGO
" » Breaking Down Deductive Reasoning Errors in LLMs." Cosmological thinking: time, space and universal causation | Sciencx - Accessed . https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/
IEEE
" » Breaking Down Deductive Reasoning Errors in LLMs." Cosmological thinking: time, space and universal causation | Sciencx [Online]. Available: https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/. [Accessed: ]
rf:citation
» Breaking Down Deductive Reasoning Errors in LLMs | Cosmological thinking: time, space and universal causation | Sciencx | https://www.scien.cx/2024/09/08/breaking-down-deductive-reasoning-errors-in-llms/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.